明确题意:

统计SQL试卷上未完成率较高的50%用户中,6级和7级用户在有试卷作答记录的近三个月中,每个月的答卷数目和完成数目。 按用户ID、月份升序排序。


问题分解:

  • 统计每次试卷作答记录的月份编号,生成子表 t_exam_record_month_rank:
    • 生成当次作答的月份排名,按年月降序进行窗口连续排名:
      • DENSE_RANK() OVER(ORDER BY DATE_FORMAT(start_time, "%Y%m") DESC) as start_month_rank
  • 筛选月份排名小于等于3的记录(近三个月的):WHERE start_month_rank <= 3
  • 筛选SQL试卷上未完成率较高的50%用户:
    • 计算每个用户在SQL试卷上的未完成率百分比排名:
      • 计算每个用户在SQL试卷上的未完成率:
        • 筛选SQL试卷:WHERE exam_id in (SELECT exam_id FROM examination_info WHERE tag='SQL')
        • 计算未完成率:1 - COUNT(submit_time) / COUNT(1) as incomplete_rate
      • 统计百分比排名,按未完成率降序uid升序进行百分比排名:
        • PERCENT_RANK() over(ORDER BY incomplete_rate DESC, uid) as incomp_rate_rank
    • 筛选未完成率较高的50%用户(因为按未完成率倒序了,所以<=):WHERE incomp_rate_rank <= 0.5
  • 筛选6、7级用户:uid IN (SELECT uid FROM user_info WHERE level>=6)
  • 按用户ID、月份分组:GROUP BY uid, start_month

细节问题:

  • 表头重命名:as
  • 按用户ID、月份升序排序:ORDER BY uid, start_month

完整代码:

SELECT uid, start_month, COUNT(1) as exam_cnt, COUNT(submit_time) as complete_cnt
FROM (
    SELECT uid, submit_time, DATE_FORMAT(start_time, "%Y%m") as start_month,
        DENSE_RANK() over(
            ORDER BY DATE_FORMAT(start_time, "%Y%m") DESC) as start_month_rank -- 按作答月份降序编号
    FROM exam_record
) as t_exam_record_month_rank
WHERE start_month_rank <= 3
AND uid IN (
    SELECT uid FROM (
        SELECT uid,
            PERCENT_RANK() over(ORDER BY incomp_rate DESC, uid) as incomp_rate_rank
        FROM (
            SELECT uid, 1 - COUNT(submit_time) / COUNT(1) as incomp_rate -- 此人未完成率
            FROM exam_record
            WHERE exam_id in (SELECT exam_id FROM examination_info WHERE tag='SQL')
            GROUP BY uid
        ) as t_exam_incom_rate -- 在SQL试卷上的未完成率
    ) as t_exam_incom_rate_rank -- 在SQL试卷上的未完成率百分比排名
    WHERE incomp_rate_rank <= 0.5
) -- 在SQL试卷上未完成率最高的50%用户
AND uid IN (SELECT uid FROM user_info WHERE `level`>=6)
GROUP BY uid, start_month
ORDER BY uid, start_month;