明确题意:
统计SQL试卷上未完成率较高的50%用户中,6级和7级用户在有试卷作答记录的近三个月中,每个月的答卷数目和完成数目。 按用户ID、月份升序排序。
问题分解:
- 统计每次试卷作答记录的月份编号,生成子表 t_exam_record_month_rank:
- 生成当次作答的月份排名,按年月降序进行窗口连续排名:
- DENSE_RANK() OVER(ORDER BY DATE_FORMAT(start_time, "%Y%m") DESC) as start_month_rank
- 生成当次作答的月份排名,按年月降序进行窗口连续排名:
- 筛选月份排名小于等于3的记录(近三个月的):WHERE start_month_rank <= 3
- 筛选SQL试卷上未完成率较高的50%用户:
- 计算每个用户在SQL试卷上的未完成率百分比排名:
- 计算每个用户在SQL试卷上的未完成率:
- 筛选SQL试卷:WHERE exam_id in (SELECT exam_id FROM examination_info WHERE tag='SQL')
- 计算未完成率:1 - COUNT(submit_time) / COUNT(1) as incomplete_rate
- 统计百分比排名,按未完成率降序uid升序进行百分比排名:
- PERCENT_RANK() over(ORDER BY incomplete_rate DESC, uid) as incomp_rate_rank
- 计算每个用户在SQL试卷上的未完成率:
- 筛选未完成率较高的50%用户(因为按未完成率倒序了,所以<=):WHERE incomp_rate_rank <= 0.5
- 计算每个用户在SQL试卷上的未完成率百分比排名:
- 筛选6、7级用户:uid IN (SELECT uid FROM user_info WHERE
level
>=6) - 按用户ID、月份分组:GROUP BY uid, start_month
细节问题:
- 表头重命名:as
- 按用户ID、月份升序排序:ORDER BY uid, start_month
完整代码:
SELECT uid, start_month, COUNT(1) as exam_cnt, COUNT(submit_time) as complete_cnt
FROM (
SELECT uid, submit_time, DATE_FORMAT(start_time, "%Y%m") as start_month,
DENSE_RANK() over(
ORDER BY DATE_FORMAT(start_time, "%Y%m") DESC) as start_month_rank -- 按作答月份降序编号
FROM exam_record
) as t_exam_record_month_rank
WHERE start_month_rank <= 3
AND uid IN (
SELECT uid FROM (
SELECT uid,
PERCENT_RANK() over(ORDER BY incomp_rate DESC, uid) as incomp_rate_rank
FROM (
SELECT uid, 1 - COUNT(submit_time) / COUNT(1) as incomp_rate -- 此人未完成率
FROM exam_record
WHERE exam_id in (SELECT exam_id FROM examination_info WHERE tag='SQL')
GROUP BY uid
) as t_exam_incom_rate -- 在SQL试卷上的未完成率
) as t_exam_incom_rate_rank -- 在SQL试卷上的未完成率百分比排名
WHERE incomp_rate_rank <= 0.5
) -- 在SQL试卷上未完成率最高的50%用户
AND uid IN (SELECT uid FROM user_info WHERE `level`>=6)
GROUP BY uid, start_month
ORDER BY uid, start_month;