明确计算公式每个日期的留存数(分子,T1表),每个日期的新增用户数(分母,T2表),两个相除。

第一步,先把uid,in_time和out_time处理好,记成基本表t1。

第二步,找新增用户数。确定方法:找到每个用户的最小登录日后, 再根据最小登录日分组计数, 就能知道每个日的新增用户数。

第三步,确定留存数。用uid和uid相连,dt和DATE_SUB(dt, INTERVAL 1 DAY)相连的左连接方式即可以算每天新用户的留存数。

得到代码:

WITH t1 AS(SELECT uid, DATE_FORMAT(in_time,'%Y-%m-%d') dt FROM tb_user_log
           UNION
           SELECT uid, DATE_FORMAT(out_time,'%Y-%m-%d') dt FROM tb_user_log)
           
SELECT T2.dt, ROUND(retention_num/new_num,2) uv_left_rate FROM
(SELECT dt, COUNT(t3.uid) retention_num FROM
 (SELECT uid, MIN(dt) dt FROM t1 GROUP BY uid) t2
 LEFT JOIN 
 (SELECT DISTINCT uid, DATE_SUB(dt, INTERVAL 1 DAY) AS newdt FROM t1) t3
 ON t2.uid=t3.uid AND t2.dt=t3.newdt
 GROUP BY dt) T1
JOIN 
(SELECT dt, COUNT(uid) new_num FROM 
 (SELECT uid, MIN(dt) dt FROM t1 GROUP BY uid) t4
 GROUP BY dt) T2
ON T1.dt=T2.dt
WHERE LEFT(T2.dt,7)='2021-11' 
ORDER BY T2.dt

上面的代码有重复的部分,可以精简一下。在确定留存数的过程中,就可以对每个日期的新增用户数进行记录了。 精简后如下:

WITH t1 AS(SELECT uid, DATE_FORMAT(in_time,'%Y-%m-%d') dt FROM tb_user_log
           UNION
           SELECT uid, DATE_FORMAT(out_time,'%Y-%m-%d') dt FROM tb_user_log)
           
SELECT dt, ROUND(COUNT(t3.uid)/ COUNT(t2.uid),2) uv_left_rate FROM
(SELECT uid, MIN(dt) dt FROM t1 GROUP BY uid) t2
LEFT JOIN 
(SELECT DISTINCT uid, DATE_SUB(dt, INTERVAL 1 DAY) AS newdt FROM t1) t3
ON t2.uid=t3.uid AND t2.dt=t3.newdt
WHERE LEFT(dt,7)='2021-11' 
GROUP BY dt
ORDER BY dt