SQL35 每月及截止当月的答题情况
题目主要信息:
输出自从有用户作答记录以来,每月的试卷作答记录中月活用户数、新增用户数、截止当月的单月最大新增用户数、截止当月的累积用户数。结果按月份升序输出。
问题拆解:
- 通过 SELECT DISTINCT 语句统计出去重后的月活用户,存入临时表 record。统计粒度为月份,因此需要 DATE_FORMAT 函数从 start_time 字段中获取月份信息。知识点:SELECT DISTINCT, DATE_FORMAT
- 为 record 表增加一列 rk,表示本月为该用户第几次登录的月份,以便之后用于判断月新增用户数(rk=1 则表示该用户是新增用户)。通过 ROW_NUMBER() OVER(PARTITION BY uid ORDER BY month ASC) 获取每个 uid 中,按照 month 升序排列,当天记录的排序。知识点:ROW_NUMBER, PARTITION BY, ORDER BY
- 从上表中统计每个月出现过的 uid 即为月活用户数(COUNT),统计出现过的 rk=1 的 uid 为月新增用户数(SUM IF)。此处需要使用 GROUP BY 对 month 维度进行聚合统计。知识点:COUNT, SUM, IF, GROUP BY
- 根据题意,截至当月的最大用户数可以用过 MAX(月新增用户) 获取,总用户数可通过 SUM(月新增用户) 统计,需要指定统计的排序口径为 month 升序。知识点:MAX, SUM, ORDER
代码:
WITH record AS
(
SELECT DISTINCT
uid,
DATE_FORMAT(start_time,'%Y%m') AS month
FROM
exam_record
)
SELECT
month,
mau,
month_add_uv,
MAX(month_add_uv) OVER(ORDER BY month) AS max_month_add_uv,
SUM(month_add_uv) OVER(ORDER BY month) AS cum_sum_uv
FROM
(
SELECT
month,
COUNT(1) AS mau,
SUM(IF(rk=1, 1, 0)) AS month_add_uv
FROM
(
SELECT
uid,
month,
ROW_NUMBER() OVER(PARTITION BY uid ORDER BY month ASC) AS rk
FROM
record
) AS u_m
GROUP BY
month
) AS result