select * from (select tv.tag,
date_format(tu.start_time,'%Y-%m-%d') as dt,
sum(sum(tu.if_like)) over(partition by tv.tag order by date_format(tu.start_time,'%Y-%m-%d') rows 6 preceding) sum_like_cnt_7d,
max(sum(if_retweet)) over(partition by tv.tag order by date_format(tu.start_time,'%Y-%m-%d') rows 6 preceding) max_retweet_cnt_7d
from tb_user_video_log tu
join tb_video_info tv
on tu.video_id = tv.video_id
#这里省略where datediff('2021-10-03',date_format(tu.start_time,'%Y-%m-%d'))<9
group by tv.tag,date_format(tu.start_time,'%Y-%m-%d'))a
where a.dt between '2021-10-01' and '2021-10-03'
order by a.tag desc,a.dt
- 难点在于窗口函数sum()over(partition by xx order by yy)中yy该怎么写 学到了 rows N preceding 包含当前行以及前N行,故加起来N+1行的数据
- 另一个容易错的就是现在内层sum计算出每个类别的每个日期下的点赞数或转发数,再通过窗口函数sum/max对计算出的结果进行累计,不要忘了内层的sum
- 看到一个答案写了限制条件where datediff('2021-10-03',date_format(tu.start_time,'%Y-%m-%d'))<9,不加也可以运行出来,加上的好处是可以节省运算资源,如果不加数据库会对所有数据进行遍历,然后再在外层查询中限制where a.dt between '2021-10-01' and '2021-10-03'