明确题意:

统计自从有用户作答记录以来,每月的试卷作答记录中月活用户数、新增用户数、截止当月的单月最大新增用户数、截止当月的累积用户数。

结果按月份升序输出。


问题分解:

  • 统计每月的月活数、新增用户数,生成子表 t_month_mau_adduv:
    • 统计每个用户的作答过月份,附带最早出现月份和当前月是否为新增用户,生成子表 t_exam_record_first_month_add:
      • 统计每个用户的作答过月份,附带最早出现月份,生成子表 t_exam_record_first_month:
        • 统计每个用户作答月份,生成子表 t_exam_record_uniq_id_month:
          • 提取作答月份:DATE_FORMAT(start_time, "%Y%m") as start_month
          • 去重:SELECT distinct uid, start_month
        • 计算最早出现月份:MIN(start_month) over(PARTITION BY uid) as first_month
      • 统计当月是否为新增用户:IF(start_month=first_month, 1, NULL) as is_new_month_user
    • 按月份分组:GROUP BY start_month
    • 计算当月月活数和新增用户数:count(uid) as mau, count(is_new_month_user) as month_add_uv
  • 统计截止当月的数据:
    • 最大新增用户数:max(month_add_uv) over(ORDER BY start_month) as max_month_add_uv
    • 累积用户数:sum(month_add_uv) over(ORDER BY start_month) as cum_sum_uv

细节问题:

  • 表头重命名:as
  • 按月份升序排序:ORDER BY start_month

完整代码:

SELECT start_month, mau, month_add_uv,
    max(month_add_uv) over(ORDER BY start_month) as max_month_add_uv,
    sum(month_add_uv) over(ORDER BY start_month) as cum_sum_uv
FROM ( -- 每月的月活数、新增用户数
    SELECT start_month, count(uid) as mau,
        count(is_new_month_user) as month_add_uv
    FROM ( -- 每个用户的作答过月份,附带最早出现月份和当前月是否为新增用户
        SELECT uid, start_month, first_month,
            IF(start_month=first_month, 1, NULL) as is_new_month_user
        FROM ( -- 每个用户的作答过月份,附带最早出现月份
            SELECT uid, start_month,
                MIN(start_month) over(PARTITION BY uid) as first_month
            FROM ( -- 唯一的用户ID、作答的月份:用户作答过的月份
                SELECT distinct uid, DATE_FORMAT(start_time, "%Y%m") as start_month
                FROM exam_record
            ) as t_exam_record_uniq_id_month
        ) as t_exam_record_first_month
    ) as t_exam_record_first_month_add
    GROUP BY start_month
) as t_month_mau_adduv
ORDER BY start_month;