SQL31 未完成率较高的50%用户近三个月答卷情况
题目主要信息:
统计 SQL 试卷上,未完成率最高的 50% 用户中,6 级和 7 级用户在有作答记录的最近三个月中,每个月答卷数目和完成数目。
问题拆分:
先筛选出所有满足条件用户的 uid, 用户需要满足以下条件:
- 完成率:为筛选该条件,需要统计用户的作答数量与完成数量。对于本场景,exam_record 表中的记录都算作 “作答数量”,因此通过 COUNT(1) 来统计用户作答数。通过 IF(submit_time IS NULL) 判断是否完成了试卷,并通过 SUM() 统计出未完成试卷次数。此处的COUNT 和 SUM 都属于聚合函数,聚合函数的使用需要指定 GROUP BY 维度,这里的聚合维度为 uid。知识点:COUNT, SUM, GROUP BY
- 6 级和 7 级用户:可以通过 level IN (6, 7) 的方法判断。知识点:IN
- top50%:为了判断用户完成率处于 top50%,需要同时知道该用户的完成率排名和总用户数量。此处我们通过 LEFT JOIN 的方式将统计好的用户数量追加在每条数据的行末。通过 ROW_NUMBER() OVER(ORDER BY incomplete / total_cnt DESC) 获得基于完成率的用户排名。知识点:ROW_NUMBER OVER, LEFT JOIN
SELECT
user_info.uid AS uid,
MAX(user_info.level) AS level,
COUNT(1) AS total_cnt,
SUM(IF(exam_record.submit_time IS NULL, 1, 0)) AS incomplete
FROM
user_info
LEFT JOIN
exam_record
ON user_info.uid = exam_record.uid
WHERE exam_record.start_time IS NOT NULL
GROUP BY user_info.uid
接下来获取所有用户最近三个月中,每个月的答卷数目和完成数目:
- 获取月份:为了获取上述数据,此处的聚合维度应当为 uid + 月份。从 DATETIME 类型数据获取 “年+月” 字符串的函数为 DATE_FORMAT(start_time,'%Y%m')。知识点:DATE_FORMAT
- 最近三个月:ROW_NUMBER() 生成排序数据时,可以指定聚合维度(PARTITION BY)、排序规则(ORDER BY),此处为了获取每个用户最近由提交记录的三个月份,需要按照 uid 做聚合,按照月份降序生成排序位次。通过 WHERE 语句筛选出位次小于等于 3 的数据。知识点:ROW_NUMBER, PARTITION BY, WHERE
SELECT
user_info.uid AS uid,
DATE_FORMAT(exam_record.start_time,'%Y%m') AS month,
COUNT(1) AS total_cnt,
SUM(IF(exam_record.submit_time IS NOT NULL, 1, 0)) AS complete,
ROW_NUMBER() OVER(PARTITION BY uid
ORDER BY DATE_FORMAT(exam_record.start_time, '%Y-%m') DESC) AS rk
FROM
user_info
LEFT JOIN
exam_record
ON user_info.uid = exam_record.uid
GROUP BY
user_info.uid,
DATE_FORMAT(exam_record.start_time,'%Y%m')
在获取上述数据后,将两者拼接,并进行必要的筛选操作即可得到最终结果
代码:
SELECT
count_info.uid,
month,
total_cnt,
complete
FROM
(
SELECT
uid,
ROW_NUMBER() OVER(ORDER BY incomplete/total_cnt DESC) AS rk,
level,
user_count.user_num
FROM
(
SELECT
user_info.uid AS uid,
MAX(user_info.level) AS level,
COUNT(1) AS total_cnt,
SUM(IF(exam_record.submit_time IS NULL, 1, 0)) AS incomplete
FROM
user_info
LEFT JOIN
exam_record
ON user_info.uid = exam_record.uid
WHERE exam_record.start_time IS NOT NULL
GROUP BY user_info.uid
) AS submit,
(
SELECT
COUNT(DISTINCT uid) AS user_num
FROM
user_info
) AS user_count
) AS valid_uid
JOIN (
SELECT
user_info.uid AS uid,
DATE_FORMAT(exam_record.start_time,'%Y%m') AS month,
COUNT(1) AS total_cnt,
SUM(IF(exam_record.submit_time IS NOT NULL, 1, 0)) AS complete,
ROW_NUMBER() OVER(PARTITION BY uid
ORDER BY DATE_FORMAT(exam_record.start_time, '%Y-%m') DESC) AS rk
FROM
user_info
LEFT JOIN
exam_record
ON user_info.uid = exam_record.uid
GROUP BY
user_info.uid,
DATE_FORMAT(exam_record.start_time,'%Y%m')
) AS count_info
ON valid_uid.uid = count_info.uid
WHERE
level IN (6, 7)
AND (valid_uid.rk-1) < user_num / 2.0
AND count_info.rk <= 3
ORDER BY
count_info.uid,
count_info.month