思路分析 新用户次日留存率p=(第一次登陆本日和次日都登陆的用户数)/(本日是第一次登陆的用户数)

想到分别对分子分母用SUM+CASE WHEN,如果本日没有新登录的用户,则分母为NULL,利用IFNULL函数将其p值转化为0。

难点在于CASE WHEN后面的条件如何表示呢?

在login表按日期分组后,分母本日期第一次登陆即l表里的用户和日期满足是该用户是第一天登录的条件即

(l.user_id,l.date) IN (SELECT user_id,MIN(date) date 
    FROM login GROUP BY user_id)

注意这里一开始我是创建了个表(SELECT user_id,MIN(date) date FROM login GROUP BY user_id) a,然后(l.user_id,l.date)IN(a.user_id,a.date),这样不对,会报错'Operand should contain 2 column(s)'

分子是本日登陆的新用户第二天会继续登录,想到用DATE_ADD函数,只要次日该用户出在login表中有记录就算登录,即

(l.user_id,DATE_ADD(l.date,INTERVAL 1 DAY))
                       IN (SELECT user_id,date FROM login) 

除这个条件外别忘了用AND连接另外一个限制条件:本日是第一次登陆(条件概率,分子是AB两个事件同时发生),于是分子的条件可写为:

(l.user_id,DATE_ADD(l.date,INTERVAL 1 DAY))
                       IN (SELECT user_id,date FROM login) 
            AND (l.user_id,l.date) IN (SELECT user_id,MIN(date) date FROM login GROUP BY user_id)

写出完整的代码为:

SELECT l.date,
    ifnull(ROUND(
        SUM(CASE WHEN (l.user_id,DATE_ADD(l.date,INTERVAL 1 DAY))
                       IN (SELECT user_id,date FROM login) 
            AND (l.user_id,l.date) IN (SELECT user_id,MIN(date) date FROM login GROUP BY user_id)THEN 1 ELSE 0 END)
        / SUM(CASE WHEN (l.user_id,l.date) IN (SELECT user_id,MIN(date) date FROM login GROUP BY user_id) THEN 1 ELSE 0 END),3),0) p
FROM login l
GROUP BY l.date
ORDER BY l.date;

还有大佬提供的其他思路

select date,IFNULL(ROUND(count()/count(),3,
from (select 去重的日期表)
    left join on (select 新用户的首次登陆日期表 )
    left join on (select 新用户的次日登录日期表 )
group by
ORDER BY 

放在from left join中。类比拿来主义,因为已经在from中写好约束条件,所以直接拿from表格中的某些字段用于count的字段。该框架类比漏斗模型,左边最宽,向右逐次缩窄,需满足的条件也越来越严苛. 详细代码:

select t0.date,
ifnull(round(count(distinct t2.user_id)/(count(t1.user_id)),3),0)
from
(
select date
from login
group by date
) t0
left join
(
select user_id,min(date) as date
from login
group by user_id
)t1
on t0.date=t1.date
left join login as t2
on t1.user_id=t2.user_id and datediff(t2.date,t1.date)=1
group by t0.date