【场景】:统计新用户
【分类】:聚合分组函数、min(date) group by uid
分析思路
难点:
1.题目说求每天的日活,所以有日期就要计算,即使新用户为0的时候也要计算进去,所以使用left join
2.记得去重
(1)统计用户活跃记录
如果in_time-进入时间和out_time-离开时间跨天了,在两天里都记为该用户活跃过
- [使用]:union
(2)统计新用户第一次活跃记录
- [使用]:min()确定用户第一次活跃时间,max()确定用户最后一次活跃时间
(3)统计每天的日活数及新用户占比
- [使用]:group by 分组条件 [日期];order by 对查询结果排序 [日期升序];
最终结果
select 查询结果 [日期;日活数;新用户占比]
from 从哪张表中查询数据[多表]
group by 分组条件 [日期]
order by 对查询结果排序 [日期升序];
求解代码
方法一:
with子句 + 一步步拆解
with
main as(
#统计用户活跃记录
select distinct
uid,
date(in_time) as dt
from tb_user_log
union
select
uid,
date(out_time) as dt
from tb_user_log
)
,attr as(
#统计新用户第一次活跃记录
select
uid,
min(dt) as dt
from main
group by uid
)
#统计每天的日活数及新用户占比
select
dt,
count(distinct a.uid) as dau,
round(count(distinct b.uid)/count(distinct a.uid),2) as uv_new_ratio
from main a
left join attr b using(dt)
group by dt
order by dt
方法二:
不使用with子句
#统计每天的日活数及新用户占比
select
dt,
count(distinct main.uid) as dau,
round(count(distinct attr.uid)/count(distinct main.uid),2) as uv_new_ratio
from(
#统计用户活跃记录
select
uid,
date(in_time) as dt
from tb_user_log
union
select
uid,
date(out_time) as dt
from tb_user_log
) main
left join(
#统计新用户第一次活跃记录
select
uid,
min(dt) as dt
from(
#统计用户活跃记录
select
uid,
date(in_time) as dt
from tb_user_log
union
select
uid,
date(out_time) as dt
from tb_user_log
) main
group by uid
) attr using(dt)
group by dt
order by dt