明确题意:
计算在2021年至少有两天作答过试卷的人中,计算该年连续两次作答试卷的最大时间窗days_window,那么根据该年的历史规律他在days_window天里平均会做多少套试卷,按最大时间窗和平均做答试卷套数倒序排序。
问题拆解:
- 本题主要是考察知识点:lead、date_format,timestampdiff、group by等。切记是以start_time计算,不是以submit_time计算!!!有些submit_time是NULL!!
- 先计算出重复次数大于2的记录的uid
- 使用lead,把下次start_time作为新的一列
- 查询出最大start_time,最小start_time,days_window
- 查询出总共持续天数days_count
- 计算出avg_exam_cnt
中间过程:
mysql> select timestampdiff(day, date_format('2021-09-01 19:59:01','%Y-%m-%d') ,date_format('2021-09-07 12:11:01','%Y-%m-%d')) ; +------------------------------------------------------------------------------------------------------------------+ | timestampdiff(day, date_format('2021-09-01 19:59:01','%Y-%m-%d') ,date_format('2021-09-07 12:11:01','%Y-%m-%d')) | +------------------------------------------------------------------------------------------------------------------+ | 6 | +------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> select *, -> lead(start_time, 1) over(partition by uid order by start_time asc) as next_start_time -- lead 往后查询 -> from exam_record -> where year(start_time) = '2021' and uid in ( -> select uid from ( -> select distinct uid,date_format(start_time,'%Y-%m-%d') -> from exam_record where year(start_time) = '2021' -> )t0 group by uid having count(*) > 1 -- 重复大于1的才行 -> ) -> ; +----+------+---------+---------------------+---------------------+-------+---------------------+ | id | uid | exam_id | start_time | submit_time | score | next_start_time | +----+------+---------+---------------------+---------------------+-------+---------------------+ | 9 | 1003 | 9003 | 2021-09-01 19:01:01 | 2021-09-01 19:59:01 | 86 | 2021-09-07 12:01:01 | | 12 | 1003 | 9003 | 2021-09-07 12:01:01 | 2021-09-07 12:11:01 | 40 | 2021-09-08 15:01:01 | | 13 | 1003 | 9003 | 2021-09-08 15:01:01 | NULL | NULL | NULL | | 11 | 1005 | 9003 | 2021-09-01 12:01:01 | 2021-09-01 12:31:02 | 85 | 2021-09-05 10:01:01 | | 4 | 1005 | 9002 | 2021-09-05 10:01:01 | 2021-09-05 10:21:01 | 81 | 2021-09-05 10:31:01 | | 5 | 1005 | 9001 | 2021-09-05 10:31:01 | 2021-09-05 10:51:01 | 81 | NULL | | 2 | 1006 | 9001 | 2021-09-01 12:11:01 | 2021-09-01 12:31:01 | 89 | 2021-09-06 10:01:01 | | 3 | 1006 | 9002 | 2021-09-06 10:01:01 | 2021-09-06 10:21:01 | 81 | 2021-09-07 10:01:01 | | 1 | 1006 | 9003 | 2021-09-07 10:01:01 | 2021-09-07 10:21:02 | 84 | NULL | +----+------+---------+---------------------+---------------------+-------+---------------------+ 9 rows in set (0.00 sec) mysql> mysql> select -> uid , -> max(start_time) as max_time , -> min(start_time) as min_time , -> max(timestampdiff(day, date_format(start_time,'%Y-%m-%d') ,date_format(next_start_time,'%Y-%m-%d')))+1 as days_window , -- 早的日期放在前面 -> count(*) as exam_cnt -> from ( -> select *, -> lead(start_time, 1) over(partition by uid order by start_time asc) as next_start_time -- lead 往后查询 -> from exam_record -> where year(start_time) = '2021' and uid in ( -> select uid from ( -> select distinct uid,date_format(start_time,'%Y-%m-%d') -> from exam_record where year(start_time) = '2021' -> )t0 group by uid having count(*) > 1 -- 重复大于1的才行 -> ) -> )t1 -> group by uid -> ; +------+---------------------+---------------------+-------------+----------+ | uid | max_time | min_time | days_window | exam_cnt | +------+---------------------+---------------------+-------------+----------+ | 1003 | 2021-09-08 15:01:01 | 2021-09-01 19:01:01 | 7 | 3 | | 1005 | 2021-09-05 10:31:01 | 2021-09-01 12:01:01 | 5 | 3 | | 1006 | 2021-09-07 10:01:01 | 2021-09-01 12:11:01 | 6 | 3 | +------+---------------------+---------------------+-------------+----------+ 3 rows in set (0.00 sec) mysql> mysql> select -> uid , -> timestampdiff(day, date_format(min_time,'%Y-%m-%d') ,date_format(max_time,'%Y-%m-%d'))+1 as days_count, -> days_window, -> exam_cnt -> from ( -> select -> uid , -> max(start_time) as max_time , -> min(start_time) as min_time , -> max(timestampdiff(day, date_format(start_time,'%Y-%m-%d') ,date_format(next_start_time,'%Y-%m-%d')))+1 as days_window , -- 早的日期放在前面 -> count(*) as exam_cnt -> from ( -> select *, -> lead(start_time, 1) over(partition by uid order by start_time asc) as next_start_time -- lead 往后查询 -> from exam_record -> where year(start_time) = '2021' and uid in ( -> select uid from ( -> select distinct uid,date_format(start_time,'%Y-%m-%d') -> from exam_record where year(start_time) = '2021' -> )t0 group by uid having count(*) > 1 -- 重复大于1的才行 -> ) -> )t1 -> group by uid -> )t3 -> ; +------+------------+-------------+----------+ | uid | days_count | days_window | exam_cnt | +------+------------+-------------+----------+ | 1003 | 8 | 7 | 3 | | 1005 | 5 | 5 | 3 | | 1006 | 7 | 6 | 3 | +------+------------+-------------+----------+ 3 rows in set (0.00 sec)
代码实现:
select uid , days_window , round(days_window * (exam_cnt /days_count ),2) as avg_exam_cnt from ( select uid , timestampdiff(day, date_format(min_time,'%Y-%m-%d') ,date_format(max_time,'%Y-%m-%d'))+1 as days_count, -- 差值要+1 days_window, exam_cnt from ( select uid , max(start_time) as max_time , min(start_time) as min_time , max(timestampdiff(day, date_format(start_time,'%Y-%m-%d') ,date_format(next_start_time,'%Y-%m-%d')))+1 as days_window , -- 记得+1, 早的日期放在前面 count(*) as exam_cnt from ( select *, lead(start_time, 1) over(partition by uid order by start_time asc) as next_start_time -- lead 往后查询 from exam_record where year(start_time) = '2021' and uid in ( select uid from ( select distinct uid,date_format(start_time,'%Y-%m-%d') from exam_record where year(start_time) = '2021' )t0 group by uid having count(*) > 1 -- 重复大于1的才行 ) )t1 group by uid )t3 )t4 order by days_window desc ,avg_exam_cnt desc ;