【场景】:统计新用户

【分类】:聚合分组函数、min(date) group by uid

分析思路

难点:

1.题目说求每天的日活,所以有日期就要计算,即使新用户为0的时候也要计算进去,所以使用left join

2.记得去重

(1)统计用户活跃记录

如果in_time-进入时间out_time-离开时间跨天了,在两天里都记为该用户活跃过

  • [使用]:union

(2)统计新用户第一次活跃记录

  • [使用]:min()确定用户第一次活跃时间,max()确定用户最后一次活跃时间

(3)统计每天的日活数及新用户占比

  • [使用]:group by 分组条件 [日期];order by 对查询结果排序 [日期升序];

最终结果

select 查询结果 [日期;日活数;新用户占比]
from 从哪张表中查询数据[多表]
group by 分组条件 [日期]
order by 对查询结果排序 [日期升序];

求解代码

方法一:

with子句 + 一步步拆解

with
    main as(
        #统计用户活跃记录
        select distinct
            uid,
            date(in_time) as dt
        from tb_user_log
        union 
        select
            uid,
            date(out_time) as dt
        from tb_user_log
    )
    ,attr as(
        #统计新用户第一次活跃记录
        select
            uid,
            min(dt) as dt
        from main
        group by uid
    )
#统计每天的日活数及新用户占比
select
    dt,
    count(distinct a.uid) as dau,
    round(count(distinct b.uid)/count(distinct a.uid),2) as uv_new_ratio
from main a
left join attr b using(dt)
group by dt
order by dt

方法二:

不使用with子句

#统计每天的日活数及新用户占比
select
    dt,
    count(distinct main.uid) as dau,
    round(count(distinct attr.uid)/count(distinct main.uid),2) as uv_new_ratio
from(
    #统计用户活跃记录
    select
        uid,
        date(in_time) as dt
    from tb_user_log
    union 
    select
        uid,
        date(out_time) as dt
    from tb_user_log
) main
left join(
    #统计新用户第一次活跃记录
    select
        uid,
        min(dt) as dt
    from(
        #统计用户活跃记录
        select
            uid,
            date(in_time) as dt
        from tb_user_log
        union 
        select
            uid,
            date(out_time) as dt
        from tb_user_log
    ) main
    group by uid
) attr using(dt)
group by dt
order by dt