【场景】:近几天活跃

【分类】:条件函数、日期函数、datediff

分析思路

难点:

1.这个题有一个坑题目中没有说清楚,沉睡用户(近7天未活跃但更早前活跃过),根据定义它是包含流失用户(近30天未活跃但更早前活跃过),也就是说只要是流失用户就是沉睡用户!实际上答案并不包括。所以沉睡用户应该这样定义:近7天未活跃但更早前活跃过且非流失用户。

2.case when 判断使每个类型的用户不会有重合。而且先判断哪个后判断哪个尤其重要,顺序不能换!

新学到:

1.用户最近一次活跃时间:max(date);一直没有想到,其实在之前是知道求用户第一次活跃时间是用:min(date)。如果使用这个会省很多力气。

因为最后一次活跃的时间可以用来确定 近几天未活跃但更早前活跃过

2.select里面可以套用select,尤其是从另外一个表用group by查询,因为group by又不好放在总的框架里面。

(1)统计用户第一次活跃的时间;用户最后一次活跃的时间

如果in_time-进入时间out_time-离开时间跨天了,在两天里都记为该用户活跃过

  • [使用]:min()确定用户第一次活跃时间,max()确定用户最后一次活跃时间

(2)近7天和近30天,涉及两个不同的活跃条件

一种方法是求出第7天是什么时候:日期减去天数得到日期;另一种是日期减去日期得到天数

  • [使用]:date_sub()、datediff('2021-11-04',dt_max) <=6、timestampdiff(day,expr1,expr2)都可以

(3)对用户分类且求比例

case when 判断就使每个类型的用户不会有重合,所以‘新晋用户’在’忠实用户‘的前面,’流失用户‘在’沉睡用户‘的前面。

  • [使用]:case when

求解代码

方法一:

with子句 + 使用 union 合并几种分类情况

#依次去求每个类型的uid有哪些
with
    main as(
        #用户第一次活跃的时间
        select
            uid,
            min(date(in_time)) as min_date
        from tb_user_log
        group by uid
    ),
    attr as(
        #近7天和近30天,两个不同条件的活跃记录表
        #先求出今日是哪一天,往前七天是那一天,往前30天是那一天
        select
            max(date(in_time)) as now_date,
            date_sub(max(date(in_time)),interval 6 day) as 7_date,
            date_sub(max(date(in_time)),interval 29 day) as 30_date
        from tb_user_log
    )
    ,temp as(
        #新晋用户(近7天新增)
        select distinct
            uid as new_u
        from main,attr
        where min_date >= 7_date
    )#输出:105、102
    ,temp1 as(
        #忠实用户(近7天活跃过且非新晋用户)
        #when date(in_time)>= 7_date and not in 新晋用户
        select distinct
            uid as zs_u
        from tb_user_log,attr
        where date(in_time)>= 7_date
        and uid not in(
            select
                uid
            from main,attr
            where min_date >= 7_date
        ) 
    )#输出:109、108、104
    ,temp2 as(
        #流失用户(近30天未活跃但更早前活跃过)
        #when date(in_time) < 30_date and not in 近30天活跃过的用户
        select distinct
            uid as ls_u
        from tb_user_log,attr
        where date(in_time)< 30_date
        and uid not in(
            select
                uid
            from tb_user_log,attr
            where date(in_time) >= 30_date
        )
    )#输出:101
    ,temp3 as(
        #沉睡用户(近7天未活跃但更早前活跃过)
        #when date(in_time) < 7_date and not in 近7天活跃过的用户
        select distinct
            uid as cs_u
        from tb_user_log,attr
        where date(in_time) < 7_date
        and uid not in(
            select
                uid
            from tb_user_log,attr
            where date(in_time) >= 7_date
        )
    )#输出:103、101

(select
    '新晋用户' as user_grade,
    count(*) as ratio
from temp
group by user_grade)
union
(select
    '忠实用户' as user_grade,
    count(*) as ratio
from temp1
group by user_grade)
union
(select
    '流失用户' as user_grade,
    count(*) as ratio
from temp2
group by user_grade)
union
(select
    '沉睡用户' as user_grade,
    count(*) as ratio
from temp3
group by user_grade)

方法二:

with 子句 + case when

with
    main as(
        #用户第一次活跃的时间,用户最后活跃的时间
        select
            uid,
            min(date(in_time)) as min_date,
            max(date(in_time)) as max_date
        from tb_user_log
        group by uid
    ),
    attr as(
        #近7天和近30天,涉及两个不同的活跃条件
        #先求出今日是哪一天,往前七天是哪一天,往前30天是哪一天
        select
            max(date(in_time)) as now_date,
            date_sub(max(date(in_time)),interval 6 day) as 7_date,
            date_sub(max(date(in_time)),interval 29 day) as 30_date
        from tb_user_log
    )
#case when 判断就使每个类型的用户不会有重合。
select
    (case
        when min_date >= 7_date
        then '新晋用户'
        when max_date >= 7_date
        then '忠实用户'
        when max_date <= 30_date
        then '流失用户'
        else '沉睡用户'
    end) as user_grade,
    round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratio
from main,attr
group by user_grade
order by ratio desc

方法三:

case when + 日期函数:date_sub()

#case when 判断就使每个类型的用户不会有重合。
select
    (case
        when min_date >= 7_date
        then '新晋用户'
        when max_date >= 7_date
        then '忠实用户'
        when max_date <= 30_date
        then '流失用户'
        else '沉睡用户'
    end) as user_grade,
    round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratio
from(
    #用户第一次活跃的时间,用户最后活跃的时间
    select
        uid,
        min(date(in_time)) as min_date,
        max(date(in_time)) as max_date
    from tb_user_log
    group by uid
) main,
(
    #近7天和近30天,涉及两个不同的活跃条件
    #先求出今日是哪一天,往前七天是哪一天,往前30天是哪一天
    select
        max(date(in_time)) as now_date,
        date_sub(max(date(in_time)),interval 6 day) as 7_date,
        date_sub(max(date(in_time)),interval 29 day) as 30_date
    from tb_user_log
) attr
group by user_grade
order by ratio desc

方法四:

case when + 日期函数: timestampdiff(day,expr1,expr2)

#case when 判断就使每个类型的用户不会有重合。
select
    (case
        when timestampdiff(day,min_date,'2021-11-04') <= 6
        then '新晋用户'
        when timestampdiff(day,max_date,'2021-11-04') <= 6
        then '忠实用户'
        when timestampdiff(day,max_date,'2021-11-04') >= 30
        then '流失用户'
        else '沉睡用户'
    end) as user_grade,
    round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratio
from(
    #用户第一次活跃的时间,用户最后活跃的时间
    select
        uid,
        min(date(in_time)) as min_date,
        max(date(in_time)) as max_date
    from tb_user_log
    group by uid
    ) main
group by user_grade
order by ratio desc

方法五:

日期函数:datediff(expr1,expr2) <=6

#case when 判断就使每个类型的用户不会有重合。
select
    (case
        when datediff('2021-11-04',min_date) <= 6
        then '新晋用户'
        when datediff('2021-11-04',max_date) <= 6
        then '忠实用户'
        when datediff('2021-11-04',max_date) >= 30
        then '流失用户'
        else '沉睡用户'
    end) as user_grade,
    round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratio
from(
    #用户第一次活跃的时间,用户最后活跃的时间
    select
        uid,
        min(date(in_time)) as min_date,
        max(date(in_time)) as max_date
    from tb_user_log
    group by uid
    ) main
group by user_grade
order by ratio desc