解题步骤
part1:题目基本信息
1)统计2021年11月每天新用户的次日留存率(保留2位小数)
- date_format(时间,‘%Y%m’)=‘202111’
- 换成容易理解的话就是,新用户的次留率。
3)如果in_time-进入时间和out_time-离开时间跨天了,在两天里都记为该用户活跃过,结果按日期升序,没有新用户,不用输出。
- 这里可以理解为把out_time也算作是活跃日。
part2:解法1
基本的思路是:用union把in_time和out_time并联起来,对uid和date去重活获得一张用户活跃表。之后找出次活用户和活跃用户,计算留存率即可。
不要忘了几个基本条件,没有新用户不用输出,留存率保留2位小数,结果按日期升序,时间范围为2021年11月。
1)关联in_time和out_time字段,建立用户活跃表
SELECT DISTINCT uid,DATE(in_time) AS dt FROM tb_user_log UNION SELECT DISTINCT uid,DATE(out_time) AS dt FROM tb_user_log
- 找出次活新用户
CASE WHEN (uid,dt)IN (SELECT uid,date_add(dt,INTERVAL -1 DAY) FROM t1) AND (uid,dt)IN (SELECT uid,MIN(dt)FROM t1 GROUP BY uid) THEN 1 ELSE 0 END AS is_next_day #当用户满足条件“连续两天登录(找次活用户),且第一个登录日期为首次登录日(找新用户)”,则整个用户为“次活新用户”
- 找出新用户
CASE WHEN (uid,dt) IN (SELECT uid,MIN(dt)FROM t1 GROUP BY uid) THEN 1 ELSE 0 END is_new_day #找出新用户
- 构建一张判断次活用户和新用户的表
WITH t1 AS( SELECT DISTINCT uid,DATE(in_time) AS dt FROM tb_user_log UNION SELECT DISTINCT uid,DATE(out_time) AS dt FROM tb_user_log ) SELECT dt, CASE WHEN (uid,dt)IN (SELECT uid,date_add(dt,INTERVAL -1 DAY) FROM t1) AND (uid,dt)IN (SELECT uid,MIN(dt)FROM t1 GROUP BY uid) THEN 1 ELSE 0 END AS is_next_day ,#次活新用户 CASE WHEN (uid,dt) IN (SELECT uid,MIN(dt)FROM t1 GROUP BY uid)#次活用户 THEN 1 ELSE 0 END is_new_day FROM t1 WHERE DATE_FORMAT(dt,'%Y%m')='202111';#取2021年11月的数据
3)计算每日的次活用户数和新用户数,并求新用户次日留存率。
- 每日次活用户数:SUM(is_next_day)
- 每日新用户数:SUM(is_new_day)
- 次日留存率,保留2位小数ROUND(SUM(is_next_day)/SUM(is_new_day),2)
- 把is_next_day和 is_new_day的定义放进上一个代码结果如下
WITH t1 AS( SELECT DISTINCT uid,DATE(in_time) AS dt FROM tb_user_log UNION SELECT DISTINCT uid,DATE(out_time) AS dt FROM tb_user_log )#用户活跃表 SELECT dt, ROUND(SUM(CASE WHEN (uid,dt)IN (SELECT uid,date_add(dt,INTERVAL -1 DAY) FROM t1) AND (uid,dt)IN (SELECT uid,MIN(dt) FROM t1 GROUP BY uid) THEN 1 ELSE 0 END) / SUM(CASE WHEN (uid,dt) IN (SELECT uid,MIN(dt)FROM t1 GROUP BY uid) THEN 1 ELSE 0 END),2) AS uv_left_rate FROM t1 WHERE DATE_FORMAT(dt,'%Y%m')='202111' GROUP BY dt HAVING uv_left_rate IS NOT NULL ORDER BY dt;
PART3:解法2窗口函数解法
1)照例找出用户活跃表
SELECT DISTINCT uid,DATE(in_time) AS dt FROM tb_user_log UNION SELECT DISTINCT uid,DATE(out_time) AS dt FROM tb_user_log
- 通过位移窗口函数对每个用户的活跃日向上迁移一行
SELECT uid,dt, MIN(dt) OVER (PARTITION BY uid) AS new_dt, LEAD(dt,1) OVER (PARTITION BY uid ORDER BY dt) AS next_dt FROM( SELECT DISTINCT uid,DATE(in_time) AS dt FROM tb_user_log UNION SELECT DISTINCT uid,DATE(out_time) AS dt FROM tb_user_log )act_table WHERE DATE_FORMAT(dt,'%Y%m')='202111'
- 如果dt=new_dt那这个用户为新用户,如果dt=new_dt且next_dt和new_dt的日期差为1则这个用户为次留新用户
WITH t1 AS( SELECT uid,dt, MIN(dt) OVER (PARTITION BY uid) AS new_dt, LEAD(dt,1) OVER (PARTITION BY uid ORDER BY dt) AS next_dt FROM( SELECT DISTINCT uid,DATE(in_time) AS dt FROM tb_user_log UNION SELECT DISTINCT uid,DATE(out_time) AS dt FROM tb_user_log ) act_table WHERE DATE_FORMAT(dt,'%Y%m')='202111' ) SELECT dt, CASE WHEN dt=new_dt AND datediff(next_dt,dt)=1 THEN 1 ELSE 0 END is_next_day, CASE WHEN dt=new_dt THEN 1 ELSE 0 END is_new_day FROM t1;
3)计算新用户次留率
WITH t1 AS( SELECT uid,dt, MIN(dt) OVER (PARTITION BY uid) AS new_dt, LEAD(dt,1) OVER (PARTITION BY uid ORDER BY dt) AS next_dt FROM( SELECT DISTINCT uid,DATE(in_time) AS dt FROM tb_user_log UNION SELECT DISTINCT uid,DATE(out_time) AS dt FROM tb_user_log ) act_table WHERE DATE_FORMAT(dt,'%Y-%m')='2021-11' ) SELECT dt, ROUND(SUM(CASE WHEN dt=new_dt AND datediff(next_dt,dt)=1 THEN 1 ELSE 0 END) / SUM(CASE WHEN dt=new_dt THEN 1 ELSE 0 END),2) uv_left_rate FROM t1 GROUP BY dt HAVING uv_left_rate IS NOT NULL ORDER BY dt;