明确题意:
计算在2021年至少有两天作答过试卷的人中,计算该年连续两次作答试卷的最大时间窗days_window,那么根据该年的历史规律他在days_window天里平均会做多少套试卷; 按最大时间窗和平均做答试卷套数倒序排序
问题分解:
- 统计2021年每人总作答数、最早最晚相隔天数、最大连续作答间隔,生成子表 t_exam_record_stat:
- 生成2021年每次作答试卷的下次作答时间,生成子表 t_exam_record_lead:
- 筛选2021年的作答记录:WHERE YEAR(start_time)=2021
- 生成下次作答时间,按用户分区按作答时间升序:
- lead(start_time) over(PARTITION BY uid ORDER BY start_time) as next_start_time
- 按用户分组:GROUP BY uid
- 统计此人作答的总试卷数:count(start_time) as exam_cnt
- 统计最早一次作答和最晚一次作答的相差天数:DATEDIFF(max(start_time), min(start_time))+1 as diff_days
- 统计两次作答的最大时间窗:max(DATEDIFF(next_start_time, start_time))+1 as days_window
- 生成2021年每次作答试卷的下次作答时间,生成子表 t_exam_record_lead:
- 筛选最早最晚相差天数大于1,即至少活跃两天的记录:WHERE diff_days > 1
- 计算平均能做多少套试卷:ROUND(days_window * exam_cnt / diff_days, 2) as avg_exam_cnt
细节问题:
- 表头重命名:as
- 按最大时间窗和平均做答试卷套数倒序排序:ORDER BY days_window DESC, avg_exam_cnt DESC
完整代码:
SELECT uid, days_window, ROUND(days_window * exam_cnt / diff_days, 2) as avg_exam_cnt
FROM (
SELECT uid,
count(start_time) as exam_cnt, -- 此人作答的总试卷数
DATEDIFF(max(start_time), min(start_time))+1 as diff_days, -- 最早一次作答和最晚一次作答的相差天数
max(DATEDIFF(next_start_time, start_time))+1 as days_window -- 两次作答的最大时间窗
FROM (
SELECT uid, exam_id, start_time,
lead(start_time) over(
PARTITION BY uid ORDER BY start_time) as next_start_time -- 将连续的下次作答时间拼上
FROM exam_record
WHERE YEAR(start_time)=2021
) as t_exam_record_lead
GROUP BY uid
) as t_exam_record_stat
WHERE diff_days > 1
ORDER BY days_window DESC, avg_exam_cnt DESC;