1. 这道题的难点是如何找到连续的天数,我的做法是先用row_number窗口函数按作者id分组对日期进行排序,为什么不用dense_rank,我们先来看二者的区别:(以101作者为例,前者是用row_number排序,后者是用dense_rank排序)
我们可以发现前者的所有日期在减去它们的名次后为11.00和11.-1(这个日期不合理,但是不重要,重要的是使连续日期变得一样,这样才能计数),而dense_rank窗口函数在减去它们的名次后都为11.00,会出现对2021-11-01这个日期重复计数,即出现四天的连续日期,显然不对
2. 接下来,也就是对日期进行计算,用date_sub这个函数,得到新的日期,连续日期的新日期都是一样的(11.-1)
3. 第三步,题目要求求出连续的天数,所以接下来我们要对刚刚求出来的新日期进行count,有三个11.-1代表连续天数为3天,并且用having函数对连续天数大于等于三天的进行筛选
4. 最后一步,也就是找出最大的连续天数,所以要对上一步求出的连续天数取一个最大值,用max函数即可
完整代码如下:
select author_id,author_level,max(consec_days) as days_cnt from ( select author_id,author_level,count(diff) as consec_days from ( select author_id,author_level, date_sub(answer_date,interval rk day) as diff from (consec_ select aw.author_id,author_level,answer_date, row_number() over(partition by aw.author_id order by answer_date) as rk from author_tb ah right join answer_tb aw on ah.author_id=aw.author_id ) as r ) as df group by author_id,author_level,diff having count(diff) >=3 ) as d group by author_id,author_level order by author_id