此题共包含一张表:

表1:用户行为日志表tb_user_log

要解决的问题:

统计2021年11月每天新用户的次日留存率(保留2位小数)。 次日留存率为当天新增的用户数中第二天又活跃了的用户数占比。 如果in_time-进入时间和out_time-离开时间跨天了,在两天里都记为该用户活跃过,结果按日期升序。

解题思路:

  1. 先找到2021年11月的每天新用户,date & uid_d0,做为表3
  2. 找到2021年11月的次日留存用户,date & uid_d1,做为表2
  3. left join表3和表2,同时让表2到的日期与表3的时间差=1,且让t3.uid_d0 = t2.uid_d1
  4. 将上述结果做为from的子查询,where2021年11月
  5. 按日期升序

知识点

  1. 做这道题一开始绕弯了,没有掌握join连接时可以同时做条件筛选

on DATEDIFF(date(t2.d),t3.dt) =1

and t3.uid_d0 = t2.uid

  • 总结:计算某日产品新增用户的次日、3日、7日、14日、30日、90日留存率
SELECT
	date '日期',
	count(uid_d0) '新增数量',
	count(uid_d1) / count(uid_d0) '次日留存',
	count(uid_d3) / count(uid_d0) '3日留存',
	...
	count(uid_d90) / count(uid_d0) '90日留存'
FROM
	(
      SELECT DISTINCT
        date,
        a.uid_d0,
        b.uid AS uid_d1,
        c.uid AS uid_d3,
        ...
        g.uid AS uid_d90
        FROM
        (
          SELECT DISTINCT date, uid as uid_d0
          FROM origin					
          GROUP BY uid
          ORDER BY date
        ) a
        LEFT JOIN origin b ON DATEDIFF(b.date),a.date) = 1
        AND a.uid_d0 = b.uid
        LEFT JOIN origin c ON DATEDIFF(c.date),a.date) = 2
        AND a.uid_d0 = c.uid
        ...
        LEFT JOIN origin g ON DATEDIFF(g.date),a.date) = 89
        AND a.uid_d0 = g.uid
	) AS temp
GROUP BY date

本题代码:

select dt, round(count(uid_d1)/count(uid_d0),2) uv_left_rate
from(
    SELECT DISTINCT dt,t3.uid_d0, t2.uid uid_d1
    from(
        SELECT dt, uid_d0
        from(
            select date(d) as dt,
            uid uid_d0,
            ROW_NUMBER() over (partition by uid ORDER BY date(d)) uid_rank
            from(
                SELECT uid, date(in_time) d
                from tb_user_log
                UNION
                select uid, date(out_time) d
                from tb_user_log
            	) t
       	   ) t1
       where uid_rank = 1
      ) t3
    left join (
        SELECT uid, date(in_time) d
        from tb_user_log
        UNION
        select uid, date(out_time) d
        from tb_user_log
    ) t2
    on DATEDIFF(date(t2.d),t3.dt) =1
    and t3.uid_d0 = t2.uid
) temp
where dt >= '2021-11-01'
GROUP BY dt