题解 | #统计活跃间隔对用户分级结果#

代码挺长的, 具体我一段段说明一下

子查询1: 注意: 需要尽量避免在select ( ) from table1括号中嵌套子查询, 因为外层表table1中, 每有一行, 都会执行一次内部的子查询, 这里有两种解决办法, 一种是用with as 提前将max(out_time) 保存为变量, 然后在查询的时候直接使用该变量, 方法二: 窗口函数的妙用, 窗口函数的执行顺序在group by之后, 可以对分组后的最大值, 再取最大, 此时获取到: ul.uid, 第一次登录日期, 今天(最大日期)
```
 select ul.uid,
     min(date(ul.in_time)) as f_login_time,
     max(max(date(ul.out_time))) over() as l_out_time
 from tb_user_log ul
 group by ul.uid
```

子查询2: 分组是为了减少数据量, 一个用户一天内可以登录多次

 select date(ul.in_time) as s_time, 
     ul.uid
 from tb_user_log ul
 group by date(ul.in_time), ul.uid

子查询1和子查询2用uid进行左连接, 得到t1.uid, t1.第一次登录时间, t1.今天(最大时间), t2.非首次的登录时间为一对多的情况

根据时间对用户进行分组, 然后计算比例就行

可能还有一些时间的细节问题, 比如7天内, 那么是两天相减, 需要包含被减的那天, 比如一个用户的浏览时间正好跨了一天

select 
 case when datediff(t1.l_out_time, t1.f_login_time) <= 6 then '新晋用户'
     when datediff(t1.l_out_time, t2.s_time) <= 6 and 
     datediff(t1.l_out_time, t1.f_login_time) >= 7 then '忠实用户'
 # 如果用户只登录过一次, 需要添加额外判定条件
 when (datediff(t1.l_out_time, t2.s_time) >= 30) or 
     (t2.s_time is null and datediff(t1.l_out_time, t1.f_login_time) >= 30) then '流失用户' 
 else '沉睡用户' end as user_grade,
 round(count(distinct t1.uid) / sum(count(distinct t1.uid)) over(), 2) as ratio
from (
 select ul.uid,
     min(date(ul.in_time)) as f_login_time,
     max(max(date(ul.out_time))) over() as l_out_time
 from tb_user_log ul
 group by ul.uid
) t1
left join (
 select date(ul.in_time) as s_time, 
     ul.uid
 from tb_user_log ul
 group by date(ul.in_time), ul.uid
) t2 on t1.uid=t2.uid and t1.f_login_time < t2.s_time
group by user_grade
order by ratio desc