2021年11月每天新用户的次日留存率
明确题意:
统计2021年11月每天新用户的次日留存率(保留2位小数),次日留存率为当天新增的用户数中第二天又活跃了的用户数占比。
如果in_time和out_time跨天了,在两天里都记为该用户活跃过,结果按日期升序。
问题分解:
-
计算用户活跃日期集合,分别考虑进入和离开时间(可能跨天):
WITH t_uid_dt as ( SELECT distinct uid, DATE(in_time) as dt FROM tb_user_log UNION SELECT distinct uid, DATE(out_time) as dt FROM tb_user_log )
-
生成每个用户活跃日期的前一天作为t_uid_dt_1:SELECT uid, DATE_SUB(dt, INTERVAL 1 DAY) as dt
-
生成每个日期的最早活跃日期(即属于新用户的日期)作为t_first_dt:SELECT uid, DATE(MIN(in_time)) as dt FROM tb_user_log GROUP BY uid
-
用t_uid_dt左连接t_uid_dt_1,得到第二天还活跃了的记录:t_uid_dt LEFT JOIN t_uid_dt_1 USING(uid, dt)
-
继续内连接t_first_dt,得到当天作为新用户第二天活跃了的记录:JOIN t_first_dt USING(uid, dt)
-
筛选有效时间窗口的记录:WHERE DATE_FORMAT(t_uid_dt.dt, "%Y%m") = "202111"
-
选取作为新用户的日期和下一天是否留下(留下的有日期,否则为空):SELECT t_uid_dt.dt as uv_new_dt, t_uid_dt_1.dt as next_day_left
-
按作为新用户的日期分组:GROUP BY uv_new_dt
-
计算每天的次日留存率:COUNT(next_day_left) / COUNT(uv_new_dt)
-
保留2位小数:ROUND(x, 2)
细节问题:
- 表头重命名:as
- 按最大人数降序排序:ORDER BY max_uv DESC;
完整代码:
WITH t_uid_dt as (
SELECT distinct uid, DATE(in_time) as dt FROM tb_user_log
UNION
SELECT distinct uid, DATE(out_time) as dt FROM tb_user_log
)
SELECT uv_new_dt as dt,
ROUND(COUNT(next_day_left) / COUNT(uv_new_dt), 2) as uv_left_rate
FROM (
SELECT t_uid_dt.dt as uv_new_dt, t_uid_dt_1.dt as next_day_left
FROM t_uid_dt
LEFT JOIN (
SELECT uid, DATE_SUB(dt, INTERVAL 1 DAY) as dt
FROM t_uid_dt
) as t_uid_dt_1 USING(uid, dt)
JOIN (
SELECT uid, DATE(MIN(in_time)) as dt
FROM tb_user_log
GROUP BY uid
) as t_first_dt USING(uid, dt)
WHERE DATE_FORMAT(t_uid_dt.dt, "%Y%m") = "202111"
) as t_uv_new_info
GROUP BY uv_new_dt
ORDER BY dt;