【场景】:近几天活跃
【分类】:条件函数、日期函数、datediff
分析思路
难点:
1.这个题有一个坑题目中没有说清楚,沉睡用户(近7天未活跃但更早前活跃过),根据定义它是包含流失用户(近30天未活跃但更早前活跃过),也就是说只要是流失用户就是沉睡用户!实际上答案并不包括。所以沉睡用户应该这样定义:近7天未活跃但更早前活跃过且非流失用户。
2.case when 判断使每个类型的用户不会有重合。而且先判断哪个后判断哪个尤其重要,顺序不能换!
新学到:
1.用户最近一次活跃时间:max(date);一直没有想到,其实在之前是知道求用户第一次活跃时间是用:min(date)。如果使用这个会省很多力气。
因为最后一次活跃的时间可以用来确定 近几天未活跃但更早前活跃过
2.select里面可以套用select,尤其是从另外一个表用group by查询,因为group by又不好放在总的框架里面。
(1)统计用户第一次活跃的时间;用户最后一次活跃的时间
如果in_time-进入时间和out_time-离开时间跨天了,在两天里都记为该用户活跃过
- [使用]:min()确定用户第一次活跃时间,max()确定用户最后一次活跃时间
 
(2)近7天和近30天,涉及两个不同的活跃条件
一种方法是求出第7天是什么时候:日期减去天数得到日期;另一种是日期减去日期得到天数
- [使用]:date_sub()、datediff('2021-11-04',dt_max) <=6、timestampdiff(day,expr1,expr2)都可以
 
(3)对用户分类且求比例
case when 判断就使每个类型的用户不会有重合,所以‘新晋用户’在’忠实用户‘的前面,’流失用户‘在’沉睡用户‘的前面。
- [使用]:case when
 
求解代码
方法一:
with子句 + 使用 union 合并几种分类情况
#依次去求每个类型的uid有哪些
with
    main as(
        #用户第一次活跃的时间
        select
            uid,
            min(date(in_time)) as min_date
        from tb_user_log
        group by uid
    ),
    attr as(
        #近7天和近30天,两个不同条件的活跃记录表
        #先求出今日是哪一天,往前七天是那一天,往前30天是那一天
        select
            max(date(in_time)) as now_date,
            date_sub(max(date(in_time)),interval 6 day) as 7_date,
            date_sub(max(date(in_time)),interval 29 day) as 30_date
        from tb_user_log
    )
    ,temp as(
        #新晋用户(近7天新增)
        select distinct
            uid as new_u
        from main,attr
        where min_date >= 7_date
    )#输出:105、102
    ,temp1 as(
        #忠实用户(近7天活跃过且非新晋用户)
        #when date(in_time)>= 7_date and not in 新晋用户
        select distinct
            uid as zs_u
        from tb_user_log,attr
        where date(in_time)>= 7_date
        and uid not in(
            select
                uid
            from main,attr
            where min_date >= 7_date
        ) 
    )#输出:109、108、104
    ,temp2 as(
        #流失用户(近30天未活跃但更早前活跃过)
        #when date(in_time) < 30_date and not in 近30天活跃过的用户
        select distinct
            uid as ls_u
        from tb_user_log,attr
        where date(in_time)< 30_date
        and uid not in(
            select
                uid
            from tb_user_log,attr
            where date(in_time) >= 30_date
        )
    )#输出:101
    ,temp3 as(
        #沉睡用户(近7天未活跃但更早前活跃过)
        #when date(in_time) < 7_date and not in 近7天活跃过的用户
        select distinct
            uid as cs_u
        from tb_user_log,attr
        where date(in_time) < 7_date
        and uid not in(
            select
                uid
            from tb_user_log,attr
            where date(in_time) >= 7_date
        )
    )#输出:103、101
(select
    '新晋用户' as user_grade,
    count(*) as ratio
from temp
group by user_grade)
union
(select
    '忠实用户' as user_grade,
    count(*) as ratio
from temp1
group by user_grade)
union
(select
    '流失用户' as user_grade,
    count(*) as ratio
from temp2
group by user_grade)
union
(select
    '沉睡用户' as user_grade,
    count(*) as ratio
from temp3
group by user_grade)
方法二:
with 子句 + case when
with
    main as(
        #用户第一次活跃的时间,用户最后活跃的时间
        select
            uid,
            min(date(in_time)) as min_date,
            max(date(in_time)) as max_date
        from tb_user_log
        group by uid
    ),
    attr as(
        #近7天和近30天,涉及两个不同的活跃条件
        #先求出今日是哪一天,往前七天是哪一天,往前30天是哪一天
        select
            max(date(in_time)) as now_date,
            date_sub(max(date(in_time)),interval 6 day) as 7_date,
            date_sub(max(date(in_time)),interval 29 day) as 30_date
        from tb_user_log
    )
#case when 判断就使每个类型的用户不会有重合。
select
    (case
        when min_date >= 7_date
        then '新晋用户'
        when max_date >= 7_date
        then '忠实用户'
        when max_date <= 30_date
        then '流失用户'
        else '沉睡用户'
    end) as user_grade,
    round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratio
from main,attr
group by user_grade
order by ratio desc
方法三:
case when + 日期函数:date_sub()
#case when 判断就使每个类型的用户不会有重合。
select
    (case
        when min_date >= 7_date
        then '新晋用户'
        when max_date >= 7_date
        then '忠实用户'
        when max_date <= 30_date
        then '流失用户'
        else '沉睡用户'
    end) as user_grade,
    round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratio
from(
    #用户第一次活跃的时间,用户最后活跃的时间
    select
        uid,
        min(date(in_time)) as min_date,
        max(date(in_time)) as max_date
    from tb_user_log
    group by uid
) main,
(
    #近7天和近30天,涉及两个不同的活跃条件
    #先求出今日是哪一天,往前七天是哪一天,往前30天是哪一天
    select
        max(date(in_time)) as now_date,
        date_sub(max(date(in_time)),interval 6 day) as 7_date,
        date_sub(max(date(in_time)),interval 29 day) as 30_date
    from tb_user_log
) attr
group by user_grade
order by ratio desc
方法四:
case when + 日期函数: timestampdiff(day,expr1,expr2)
#case when 判断就使每个类型的用户不会有重合。
select
    (case
        when timestampdiff(day,min_date,'2021-11-04') <= 6
        then '新晋用户'
        when timestampdiff(day,max_date,'2021-11-04') <= 6
        then '忠实用户'
        when timestampdiff(day,max_date,'2021-11-04') >= 30
        then '流失用户'
        else '沉睡用户'
    end) as user_grade,
    round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratio
from(
    #用户第一次活跃的时间,用户最后活跃的时间
    select
        uid,
        min(date(in_time)) as min_date,
        max(date(in_time)) as max_date
    from tb_user_log
    group by uid
    ) main
group by user_grade
order by ratio desc
方法五:
日期函数:datediff(expr1,expr2) <=6
#case when 判断就使每个类型的用户不会有重合。
select
    (case
        when datediff('2021-11-04',min_date) <= 6
        then '新晋用户'
        when datediff('2021-11-04',max_date) <= 6
        then '忠实用户'
        when datediff('2021-11-04',max_date) >= 30
        then '流失用户'
        else '沉睡用户'
    end) as user_grade,
    round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratio
from(
    #用户第一次活跃的时间,用户最后活跃的时间
    select
        uid,
        min(date(in_time)) as min_date,
        max(date(in_time)) as max_date
    from tb_user_log
    group by uid
    ) main
group by user_grade
order by ratio desc

京公网安备 11010502036488号