明确题意:

计算在2021年至少有两天作答过试卷的人中,计算该年连续两次作答试卷的最大时间窗days_window,那么根据该年的历史规律他在days_window天里平均会做多少套试卷; 按最大时间窗和平均做答试卷套数倒序排序


问题分解:

  • 统计2021年每人总作答数、最早最晚相隔天数、最大连续作答间隔,生成子表 t_exam_record_stat:
    • 生成2021年每次作答试卷的下次作答时间,生成子表 t_exam_record_lead:
      • 筛选2021年的作答记录:WHERE YEAR(start_time)=2021
      • 生成下次作答时间,按用户分区按作答时间升序:
        • lead(start_time) over(PARTITION BY uid ORDER BY start_time) as next_start_time
    • 按用户分组:GROUP BY uid
    • 统计此人作答的总试卷数:count(start_time) as exam_cnt
    • 统计最早一次作答和最晚一次作答的相差天数:DATEDIFF(max(start_time), min(start_time))+1 as diff_days
    • 统计两次作答的最大时间窗:max(DATEDIFF(next_start_time, start_time))+1 as days_window
  • 筛选最早最晚相差天数大于1,即至少活跃两天的记录:WHERE diff_days > 1
  • 计算平均能做多少套试卷:ROUND(days_window * exam_cnt / diff_days, 2) as avg_exam_cnt

细节问题:

  • 表头重命名:as
  • 按最大时间窗和平均做答试卷套数倒序排序:ORDER BY days_window DESC, avg_exam_cnt DESC

完整代码:

SELECT uid, days_window, ROUND(days_window * exam_cnt / diff_days, 2) as avg_exam_cnt
FROM (
    SELECT uid,
        count(start_time) as exam_cnt,  -- 此人作答的总试卷数
        DATEDIFF(max(start_time), min(start_time))+1 as diff_days, -- 最早一次作答和最晚一次作答的相差天数
        max(DATEDIFF(next_start_time, start_time))+1 as days_window -- 两次作答的最大时间窗
    FROM (
        SELECT uid, exam_id, start_time,
            lead(start_time) over(
                PARTITION BY uid ORDER BY start_time) as next_start_time -- 将连续的下次作答时间拼上
        FROM exam_record
        WHERE YEAR(start_time)=2021
    ) as t_exam_record_lead
    GROUP BY uid
) as t_exam_record_stat
WHERE diff_days > 1
ORDER BY days_window DESC, avg_exam_cnt DESC;