SQL31 未完成率较高的50%用户近三个月答卷情况

题目主要信息:

统计 SQL 试卷上,未完成率最高的 50% 用户中,6 级和 7 级用户在有作答记录的最近三个月中,每个月答卷数目和完成数目。

问题拆分:

先筛选出所有满足条件用户的 uid, 用户需要满足以下条件:

  • 完成率:为筛选该条件,需要统计用户的作答数量与完成数量。对于本场景,exam_record 表中的记录都算作 “作答数量”,因此通过 COUNT(1) 来统计用户作答数。通过 IF(submit_time IS NULL) 判断是否完成了试卷,并通过 SUM() 统计出未完成试卷次数。此处的COUNT 和 SUM 都属于聚合函数,聚合函数的使用需要指定 GROUP BY 维度,这里的聚合维度为 uid。知识点:COUNT, SUM, GROUP BY
  • 6 级和 7 级用户:可以通过 level IN (6, 7) 的方法判断。知识点:IN
  • top50%:为了判断用户完成率处于 top50%,需要同时知道该用户的完成率排名和总用户数量。此处我们通过 LEFT JOIN 的方式将统计好的用户数量追加在每条数据的行末。通过 ROW_NUMBER() OVER(ORDER BY incomplete / total_cnt DESC) 获得基于完成率的用户排名。知识点:ROW_NUMBER OVER, LEFT JOIN
SELECT
    user_info.uid AS uid,
    MAX(user_info.level) AS level,
    COUNT(1) AS total_cnt,
    SUM(IF(exam_record.submit_time IS NULL, 1, 0)) AS incomplete
FROM
    user_info
LEFT JOIN
    exam_record
ON user_info.uid = exam_record.uid
WHERE exam_record.start_time IS NOT NULL
GROUP BY user_info.uid

接下来获取所有用户最近三个月中,每个月的答卷数目和完成数目:

  • 获取月份:为了获取上述数据,此处的聚合维度应当为 uid + 月份。从 DATETIME 类型数据获取 “年+月” 字符串的函数为 DATE_FORMAT(start_time,'%Y%m')。知识点:DATE_FORMAT
  • 最近三个月:ROW_NUMBER() 生成排序数据时,可以指定聚合维度(PARTITION BY)、排序规则(ORDER BY),此处为了获取每个用户最近由提交记录的三个月份,需要按照 uid 做聚合,按照月份降序生成排序位次。通过 WHERE 语句筛选出位次小于等于 3 的数据。知识点:ROW_NUMBER, PARTITION BY, WHERE
SELECT
    user_info.uid AS uid,
    DATE_FORMAT(exam_record.start_time,'%Y%m') AS month,
    COUNT(1) AS total_cnt,
    SUM(IF(exam_record.submit_time IS NOT NULL, 1, 0)) AS complete,
    ROW_NUMBER() OVER(PARTITION BY uid 
                        ORDER BY DATE_FORMAT(exam_record.start_time, '%Y-%m') DESC) AS rk
FROM
    user_info
LEFT JOIN
    exam_record
ON user_info.uid = exam_record.uid
GROUP BY
    user_info.uid,
    DATE_FORMAT(exam_record.start_time,'%Y%m')

在获取上述数据后,将两者拼接,并进行必要的筛选操作即可得到最终结果

代码:

SELECT
    count_info.uid,
    month,
    total_cnt,
    complete
FROM
(
    SELECT
        uid,
        ROW_NUMBER() OVER(ORDER BY incomplete/total_cnt DESC) AS rk,
        level,
        user_count.user_num
    FROM
    (
        SELECT
            user_info.uid AS uid,
            MAX(user_info.level) AS level,
            COUNT(1) AS total_cnt,
            SUM(IF(exam_record.submit_time IS NULL, 1, 0)) AS incomplete
        FROM
            user_info
        LEFT JOIN
            exam_record
        ON user_info.uid = exam_record.uid
        WHERE exam_record.start_time IS NOT NULL
        GROUP BY user_info.uid
    ) AS submit,
    (
        SELECT
            COUNT(DISTINCT uid) AS user_num
        FROM
            user_info
    ) AS user_count
) AS valid_uid
JOIN (
    SELECT
        user_info.uid AS uid,
        DATE_FORMAT(exam_record.start_time,'%Y%m') AS month,
        COUNT(1) AS total_cnt,
        SUM(IF(exam_record.submit_time IS NOT NULL, 1, 0)) AS complete,
        ROW_NUMBER() OVER(PARTITION BY uid 
                            ORDER BY DATE_FORMAT(exam_record.start_time, '%Y-%m') DESC) AS rk
    FROM
        user_info
    LEFT JOIN
        exam_record
    ON user_info.uid = exam_record.uid
    GROUP BY
        user_info.uid,
        DATE_FORMAT(exam_record.start_time,'%Y%m')
) AS count_info
ON valid_uid.uid = count_info.uid
WHERE
    level IN (6, 7)
    AND (valid_uid.rk-1) < user_num / 2.0
    AND count_info.rk <= 3
ORDER BY
    count_info.uid,
    count_info.month