牛客网SQL第70题。互联网公司经常要统计新用户留存率,所以是很有价值的一题,简单写下思路。
我们最终需要的表结构大致如下,用集合的思想来理解,以date对整体进行划分,划分完之后每个结合里需要包括每个date的新用户数,每个date的留存用户数,这两部分相除组成留存率。
select date,xxxx from group by date
基于此进一步完善我们想要的框架:
select c.date, round(count(d.user_id)/count(),3)as p from (每天的新用户表) as c left join (每天的留存用户表) as d on c.user_id = d.user_id group by c.date
新用户表,别名c
select a.date,b.user_id from (select distinct l1.date from login l1)a left join (select l2.user_id,min(l2.date) as f_date from login l2 group by l2.user_id)b on a.date=b.f_date留存用户表,别名d。留存表用了好几种写法试了,最后发现还是两表连接最简单了,不搞花里胡哨的。
select distinct l3.user_id from login l3,login l4 where l3.user_id=l4.user_id and DATE_ADD(l3.date,INTERVAL 1 DAY)=l4.date
最后拼接
select c.date, round(count(d.user_id)/count(*),3)as p from (select a.date,b.user_id from (select distinct l1.date from login l1)a left join (select l2.user_id,min(l2.date) as f_date from login l2 group by l2.user_id)b on a.date=b.f_date ) as c left join (select distinct l3.user_id from login l3,login l4 where l3.user_id=l4.user_id and DATE_ADD(l3.date,INTERVAL 1 DAY)=l4.date) as d on c.user_id = d.user_id group by c.date
就好啦。感想是面向集合的思想真的非常非常重要。