max(两次连续作答时间间隔+1) ←按解释中1号-6号时间窗为6天
3) 该年的历史规律:
平均每天作答试卷套数:=年总套数/(max作答时间-min作答时间+1) ←按解释中:他1号到7号这7天里共做了3张试卷,平均每天3/7=0.428571张
4) 在days_window天里平均会做多少套试卷:(保留2位小数)
=平均每天作答试卷套数*days_window ←按解释中: =0.428571*days_window
(2) 用到的知识点:
lead(start_time,1) over(partition by uid order by start_time) as next_time
from exam_record order by uid asc,start_time asc;
输出结果:将sart_time按uid,排序后,按uid分组把start_time每一行接着的时间生成新了列next_time,只要计算出start_time与next_time之间的时间天数,便是两次连续作答时间窗口天数
id |
uid |
exam_id |
start_time |
next_time |
4 |
1005 |
9002 |
2021-09-05 10:01:01 |
2021-09-05 10:31:01 |
5 |
1005 |
9001 |
2021-09-05 10:31:01 |
None |
2 |
1006 |
9001 |
2021-09-01 12:11:01 |
2021-09-06 10:01:01 |
3 |
1006 |
9002 |
2021-09-06 10:01:01 |
2021-09-07 10:01:01 |
1 |
1006 |
9003 |
2021-09-07 10:01:01 |
None |
‘多少套试卷’会让人理解错误,用count(distinct exam_id)求出结果错误,要去掉去重distinct。题中‘多少套试卷’居然是指的是做试卷的次数,即同一个exam_id的试卷做2次也是做了2套试卷。
select uid, max(datediff(next_time,start_time)+1) as days_window, round(count(exam_id)/(datediff(max(start_time),min(start_time))+1)*max(datediff(next_time,start_time)+1),2) as avg_exam_cnt from (select id,uid,exam_id,start_time, lead(start_time,1) over(partition by uid order by start_time) as next_time from exam_record where year(start_time)=2021) t group by uid having count(distinct date(start_time))>=2 order by days_window desc,avg_exam_cnt desc;