此题共包含一张表:
表1:用户行为日志表tb_user_log
要解决的问题:
统计2021年11月每天新用户的次日留存率(保留2位小数)。 次日留存率为当天新增的用户数中第二天又活跃了的用户数占比。 如果in_time-进入时间和out_time-离开时间跨天了,在两天里都记为该用户活跃过,结果按日期升序。
解题思路:
- 先找到2021年11月的每天新用户,date & uid_d0,做为表3
- 找到2021年11月的次日留存用户,date & uid_d1,做为表2
- left join表3和表2,同时让表2到的日期与表3的时间差=1,且让t3.uid_d0 = t2.uid_d1
- 将上述结果做为from的子查询,where2021年11月
- 按日期升序
知识点
- 做这道题一开始绕弯了,没有掌握join连接时可以同时做条件筛选
on DATEDIFF(date(t2.d),t3.dt) =1
and t3.uid_d0 = t2.uid
- 总结:计算某日产品新增用户的次日、3日、7日、14日、30日、90日留存率
SELECT
date '日期',
count(uid_d0) '新增数量',
count(uid_d1) / count(uid_d0) '次日留存',
count(uid_d3) / count(uid_d0) '3日留存',
...
count(uid_d90) / count(uid_d0) '90日留存'
FROM
(
SELECT DISTINCT
date,
a.uid_d0,
b.uid AS uid_d1,
c.uid AS uid_d3,
...
g.uid AS uid_d90
FROM
(
SELECT DISTINCT date, uid as uid_d0
FROM origin
GROUP BY uid
ORDER BY date
) a
LEFT JOIN origin b ON DATEDIFF(b.date),a.date) = 1
AND a.uid_d0 = b.uid
LEFT JOIN origin c ON DATEDIFF(c.date),a.date) = 2
AND a.uid_d0 = c.uid
...
LEFT JOIN origin g ON DATEDIFF(g.date),a.date) = 89
AND a.uid_d0 = g.uid
) AS temp
GROUP BY date
本题代码:
select dt, round(count(uid_d1)/count(uid_d0),2) uv_left_rate
from(
SELECT DISTINCT dt,t3.uid_d0, t2.uid uid_d1
from(
SELECT dt, uid_d0
from(
select date(d) as dt,
uid uid_d0,
ROW_NUMBER() over (partition by uid ORDER BY date(d)) uid_rank
from(
SELECT uid, date(in_time) d
from tb_user_log
UNION
select uid, date(out_time) d
from tb_user_log
) t
) t1
where uid_rank = 1
) t3
left join (
SELECT uid, date(in_time) d
from tb_user_log
UNION
select uid, date(out_time) d
from tb_user_log
) t2
on DATEDIFF(date(t2.d),t3.dt) =1
and t3.uid_d0 = t2.uid
) temp
where dt >= '2021-11-01'
GROUP BY dt