with t0 as(
    select
    uid,date(in_time) dt
    from tb_user_log
    union 
    select
    uid,date(out_time) dt
    from tb_user_log
),
t1 as(
    select
    *,min(dt)over(partition by uid order by dt) first_day
    from t0
)
select
dt
,count(1) dau
,round(sum(if(first_day=dt,1,0))/count(1),2) ration
from t1
group by 1
order by 1

题目

统计每天的日活数及新用户占比

一、拆解题目

1.求什么

每天:以天聚合

日活数:每天活跃人数:date,count(1),以group by date

新用户占比:每日,活跃的新用户人数/当日总活跃人数

活跃的新用户人数:2种方法如下

(1)明细表基础上

① 加一列,Min()开窗,标出每个用户首日活跃日期。

②如果(i)统计日期date,和首日相等,在那一天,ta就是新用户。

③合计sum(if(统计日期=首日活跃,即为新用户,记1,否则都记0),以此求出新用户数量

(2)聚合,分别求出新用户数量,再和聚合的总日活left join

基本思路还是:date,count(1),以group by date

加一个人群限定:where (uid,date) in 子查询(select uid, min(date)first_day from group by 1)

2.输出什么

date,dau,ratio

二、步骤

数据预处理

本题跨天,均记活跃,所以需要union,把in_time和out_time并在一列,并去重

with t0 as(
    select
    uid,date(in_time) dt
    from tb_user_log
    union 
    select
    uid,date(out_time) dt
    from tb_user_log
),

方法1:标记新用户首日,sum(if)统计新用户,count(1)统计全部活跃用户

① 加一列,Min()开窗,标出每个用户首日活跃日期。

②如果(if)统计日期date,和首日相等,在那一天,ta就是新用户。

③合计sum(if(统计日期=首日活跃,即为新用户,记1,否则都记0),以此求出新用户数量

t1 as(
    select
    *,min(dt)over(partition by uid order by dt) first_day
    from t0
)
select
dt
,count(1) dau
,round(sum(if(first_day=dt,1,0))/count(1),2) uv_new_ratio
from t1
group by 1
order by 1

方法2:分别聚合,再左连接

①t1:新用户人数,用where in 实现新用户筛选,用dt,count(1)实现人数统计

②每日活跃人数:用from子查询实现——为什么不直接连原表t0。因为t0和t1,一个汇总一个明细,dt连接,1对多,会出现笛卡尔积。

③left join t1

④计算,这种除法,注意处理一下分子分母为0的情况:ifnull

代码

t1 as(#新用户人数
    select
    dt,count(1) new
    from t0 
    where (uid,dt) in(select uid,min(dt) first_day from t0
    group by 1)
    group by 1)
select
dt
,dau
,round(ifnull(new/dau,0),2) uv_new_ratio
from (select dt,count(1) dau from t0
    group by 1) a
left join t1  using(dt)
order by 1

三、完整代码

方法1

with t0 as(
    select
    uid,date(in_time) dt
    from tb_user_log
    union 
    select
    uid,date(out_time) dt
    from tb_user_log
),
t1 as(
    select
    *,min(dt)over(partition by uid order by dt) first_day
    from t0)

select
dt
,count(1) dau
,round(sum(if(first_day=dt,1,0))/count(1),2) uv_new_ratio
from t1
group by 1
order by 1

方法2

with t0 as(
    select
    uid,date(in_time) dt
    from tb_user_log
    union 
    select
    uid,date(out_time) dt
    from tb_user_log
),
t1 as(#新用户人数
    select
    dt,count(1) new
    from t0 
    where (uid,dt) in(select uid,min(dt) first_day from t0
    group by 1)
    group by 1)
select
dt
,dau
,round(ifnull(new/dau,0),2) uv_new_ratio
from (select dt,count(1) dau from t0
    group by 1) a
left join t1 using(dt)
order by 1