解题逻辑解读

看了前面高赞的解法,基本上都是采用的通过制造时序表,然后通过左连接关联获取到新用户是否为次日留存用户,这个思路也不错,但是我感觉还用更好的解法。
我们以每个人为研究个体,一个用户第一天登录就打上标签新用户,然后该(客户第二次登陆日期-第一次登陆日期)= 1天 or (退出日期 - 进入日期) = 1天,则打上次日留存标签,反之则是次日未留存标签,然后以用户首日登录日期为分组,统计当日新用户数,和次日留存的用户数,然后两者相除,就可以获得相应的次日留存率。

  • 用户是否为新用户可以通过对uid进行排序开窗,然后筛选第一次出登录的日期即为该用户第一次登陆日期。通过开创的排序序号1,可以给用户打上新用户标签
  • 要找到该用户第二次登录登陆时间,并且还要和第一次登录处于一行,那可以使用lead()开窗函数,以此来取uid第二次出现的日期
          select uid,
           date(in_time)  as in_date,
           date(out_time) as out_date,
           date(lead(in_time,1) over (partition by uid)) as sec_date,
           row_number() over (partition by uid order by date(in_time)) rn,
          if(row_number() over (partition by uid order by date(in_time)) = 1,'n',null ) as new_tag
    from tb_user_log
  • 然后通过登陆日期与退出日期的差值,以及登陆日期和二次登录日期的差值,判断是否可以打上次日留存标签
    with  cte01 as (
      select uid,
           date(in_time)  as in_date,
           date(out_time) as out_date,
           date(lead(in_time,1) over (partition by uid)) as sec_date,
           row_number() over (partition by uid order by date(in_time)) rn,
          if(row_number() over (partition by uid order by date(in_time)) = 1,'n',null ) as new_tag
    from tb_user_log)
    select uid,
         in_date,
         case when if(datediff(out_date,in_date) = 0,0 ,1) = 1 or if(datediff(sec_date,in_date) = 1 ,1,0) = 1 then 1 else 0 end lu_tag,
         new_tag
    from cte01
    where rn =1
  • 最后以登陆日期聚合,并且统计当日新用户和该新用户次日留存数,并相除获取次日留存率
    注意:
  1. 不能提前将登陆日期筛选为11月,因为有些用户可能在10月就注册了,假如是提前筛选,就会导致多次记录新用户。
  2. 按日期来排序
  3. 当日没有新用户要剔除当日显示
    with  cte01 as (
     select uid,
          date(in_time)  as in_date,
          date(out_time) as out_date,
          date(lead(in_time,1) over (partition by uid)) as sec_date,
          row_number() over (partition by uid order by date(in_time)) rn,
         if(row_number() over (partition by uid order by date(in_time)) = 1,1,0 ) as new_tag
    from tb_user_log), -- 给新用户打标签,然后统计当日的新用户书
    cte02 as (
    select uid,
        in_date,
        case when 
             if(datediff(out_date,in_date) = 0,0 ,1) = 1 or if(datediff(sec_date,in_date) = 1 ,1,0) = 1 
             then 1 else 0 end lu_tag,
        new_tag
    from cte01
    where rn =1)
    select  in_date,
        round(sum(lu_tag)/sum(new_tag),2) uv_left_rate
    from cte02
    where in_date BETWEEN '2021-11-01' AND '2021-11-30'
    group by in_date
    having count(new_tag) != 0
    order by in_date;