with a as( select artical_id, in_time dt,1 as type from tb_user_log where artical_id !=0 union all select artical_id, out_time dt,-1 as type from tb_user_log where artical_id !=0 ) select artical_id, max(uv) as max_uv from( select artical_id, sum(type) over(partition by artical_id order by dt,type desc) as uv from a )b group by artical_id order by max_uv desc
本来我的想法是先通过timestampdiff算出来从min(in_time)到max(out_time)的时间差,然后用max(count(*)) over(partition by xx order by xx rows N preceding)去计算每个时刻的uv,但是timestampdiff是一个时间差,不能作为N的参数
我之前的代码一直错误。。但也改不出来
with t as( select artical_id,max(out_time) as max_time, min(in_time) as min_time, timestampdiff(second,min(in_time),max(out_time)) as jiange from tb_user_log where artical_id !=0 group by artical_id ) select t.artical_id, max(count(distinct uid)) over(partition by t.artical_id order by max_time rows jiange preceding) as max_uv from tb_user_log tb join t on tb.artical_id = t.artical_id where t.artical_id !=0 group by t.artical_id order by max_uv desc
参考了高赞大佬,核心逻辑是:
1)对原表in_time和out_time进行编码,in为观看人数+1, out为观看人数-1,然后用union all连接起来,把in_time和out_time两个字段合成一个字段dt,这样就可以计算in_time 和out_time间每个时刻的新增人数&离开人数
2)题目要求在瞬时统计时遵循【先进后出】:如果同一时刻有进入也有离开时,先记录用户数增加,再记录减少。
因此在ORDER BY层面,在遵循dt升序的同时,还要遵循先+1,再-1的原则,即type DESC:
SUM(type) OVER(PARTITION BY artical_id ORDER BY dt, type DESC)
3)我自己的问题:为什么不能直接写成max(sum(type) over(partition by artical_id order by dt )) as max_uv group by artical_id
询问AI后得以解答,会先执行group by,再执行sum(type) over(partition by artical_id order by dt ),而sum()over()是要对每一行进行计算的,group by之后就没法对每一行进行计算了