每天的日活数及新用户占比

明确题意:

统计每天的日活数及新用户占比,新用户占比=当天的新用户数÷当天活跃用户数(日活数)。

如果in_time和out_time跨天了,在两天里都记为该用户活跃过。

新用户占比保留2位小数,结果按日期升序排序


问题分解:

  • 计算每天的日活数(生成子表t_dau):
    • 计算每条记录活跃信息(生成子表t_active_dt):
      • 分别将记录中的进入和离开时间作为该用户的活跃日期
      • 进入日期:DATE(in_time) as dt
      • 离开日期:DATE(out_time) as dt
      • 合并结果:UNION
    • 按日期分组:GROUP BY dt
    • 统计用户数:COUNT(distinct uid) as dau
  • 计算每天新增用户数:
    • 计算每个用户最早出现日期(即作为新用户的日期,生成子表t_uv_new_daily)
      • 按用户分组:GROUP BY uid
      • 统计最早日期:DATE(MIN(in_time)) as dt
    • 按日期分组:GROUP BY dt
    • 统计新增用户数:COUNT(uid) as uv_new_daily
  • 用日期左连接两个表:t_dau LEFT JOIN t_uv_new_daily USING(dt)
  • 计算新用户占比:IFNULL(uv_new_daily, 0) / dau as uv_new_ratio

细节问题:

  • 表头重命名:as
  • 按日期排序:ORDER BY dt;

完整代码:

SELECT dt, dau, ROUND(IFNULL(uv_new_daily, 0) / dau, 2) as uv_new_ratio
FROM (
    SELECT dt, COUNT(distinct uid) as dau
    FROM (
        SELECT uid, DATE(in_time) as dt FROM tb_user_log
        UNION
        SELECT uid, DATE(out_time) as dt FROM tb_user_log
    ) as t_active_dt
    GROUP BY dt
) as t_dau
LEFT JOIN (
    SELECT dt, COUNT(uid) as uv_new_daily
    FROM (
        SELECT uid, DATE(MIN(in_time)) as dt
        FROM tb_user_log
        GROUP BY uid
    ) as t_uid_first_dt
    GROUP BY dt
) as t_uv_new_daily
USING(dt)
ORDER BY dt;