2021年11月每天新用户的次日留存率

明确题意:

统计2021年11月每天新用户的次日留存率(保留2位小数),次日留存率为当天新增的用户数中第二天又活跃了的用户数占比。

如果in_time和out_time跨天了,在两天里都记为该用户活跃过,结果按日期升序。


问题分解:

  • 计算用户活跃日期集合,分别考虑进入和离开时间(可能跨天):

    WITH t_uid_dt as (
        SELECT distinct uid, DATE(in_time) as dt FROM tb_user_log
        UNION
        SELECT distinct uid, DATE(out_time) as dt FROM tb_user_log
    )
    
  • 生成每个用户活跃日期的前一天作为t_uid_dt_1:SELECT uid, DATE_SUB(dt, INTERVAL 1 DAY) as dt

  • 生成每个日期的最早活跃日期(即属于新用户的日期)作为t_first_dt:SELECT uid, DATE(MIN(in_time)) as dt FROM tb_user_log GROUP BY uid

  • 用t_uid_dt左连接t_uid_dt_1,得到第二天还活跃了的记录:t_uid_dt LEFT JOIN t_uid_dt_1 USING(uid, dt)

  • 继续内连接t_first_dt,得到当天作为新用户第二天活跃了的记录:JOIN t_first_dt USING(uid, dt)

  • 筛选有效时间窗口的记录:WHERE DATE_FORMAT(t_uid_dt.dt, "%Y%m") = "202111"

  • 选取作为新用户的日期和下一天是否留下(留下的有日期,否则为空):SELECT t_uid_dt.dt as uv_new_dt, t_uid_dt_1.dt as next_day_left

  • 按作为新用户的日期分组:GROUP BY uv_new_dt

  • 计算每天的次日留存率:COUNT(next_day_left) / COUNT(uv_new_dt)

  • 保留2位小数:ROUND(x, 2)


细节问题:

  • 表头重命名:as
  • 按最大人数降序排序:ORDER BY max_uv DESC;

完整代码:

WITH t_uid_dt as (
    SELECT distinct uid, DATE(in_time) as dt FROM tb_user_log
    UNION
    SELECT distinct uid, DATE(out_time) as dt FROM tb_user_log
)

SELECT uv_new_dt as dt,
    ROUND(COUNT(next_day_left) / COUNT(uv_new_dt), 2) as uv_left_rate
FROM (
    SELECT t_uid_dt.dt as uv_new_dt, t_uid_dt_1.dt as next_day_left
    FROM t_uid_dt
    LEFT JOIN (
        SELECT uid, DATE_SUB(dt, INTERVAL 1 DAY) as dt
        FROM t_uid_dt
    ) as t_uid_dt_1 USING(uid, dt)
    JOIN (
        SELECT uid, DATE(MIN(in_time)) as dt
        FROM tb_user_log
        GROUP BY uid
    ) as t_first_dt USING(uid, dt)
    WHERE DATE_FORMAT(t_uid_dt.dt, "%Y%m") = "202111"
) as t_uv_new_info
GROUP BY uv_new_dt
ORDER BY dt;