【场景】:近几天活跃
【分类】:条件函数、日期函数、datediff
分析思路
难点:
1.这个题有一个坑题目中没有说清楚,沉睡用户(近7天未活跃但更早前活跃过),根据定义它是包含流失用户(近30天未活跃但更早前活跃过),也就是说只要是流失用户就是沉睡用户!实际上答案并不包括。所以沉睡用户应该这样定义:近7天未活跃但更早前活跃过且非流失用户。
2.case when 判断使每个类型的用户不会有重合。而且先判断哪个后判断哪个尤其重要,顺序不能换!
新学到:
1.用户最近一次活跃时间:max(date);一直没有想到,其实在之前是知道求用户第一次活跃时间是用:min(date)。如果使用这个会省很多力气。
因为最后一次活跃的时间可以用来确定 近几天未活跃但更早前活跃过
2.select里面可以套用select,尤其是从另外一个表用group by查询,因为group by又不好放在总的框架里面。
(1)统计用户第一次活跃的时间;用户最后一次活跃的时间
如果in_time-进入时间和out_time-离开时间跨天了,在两天里都记为该用户活跃过
- [使用]:min()确定用户第一次活跃时间,max()确定用户最后一次活跃时间
(2)近7天和近30天,涉及两个不同的活跃条件
一种方法是求出第7天是什么时候:日期减去天数得到日期;另一种是日期减去日期得到天数
- [使用]:date_sub()、datediff('2021-11-04',dt_max) <=6、timestampdiff(day,expr1,expr2)都可以
(3)对用户分类且求比例
case when 判断就使每个类型的用户不会有重合,所以‘新晋用户’在’忠实用户‘的前面,’流失用户‘在’沉睡用户‘的前面。
- [使用]:case when
求解代码
方法一:
with子句 + 使用 union 合并几种分类情况
#依次去求每个类型的uid有哪些
with
main as(
#用户第一次活跃的时间
select
uid,
min(date(in_time)) as min_date
from tb_user_log
group by uid
),
attr as(
#近7天和近30天,两个不同条件的活跃记录表
#先求出今日是哪一天,往前七天是那一天,往前30天是那一天
select
max(date(in_time)) as now_date,
date_sub(max(date(in_time)),interval 6 day) as 7_date,
date_sub(max(date(in_time)),interval 29 day) as 30_date
from tb_user_log
)
,temp as(
#新晋用户(近7天新增)
select distinct
uid as new_u
from main,attr
where min_date >= 7_date
)#输出:105、102
,temp1 as(
#忠实用户(近7天活跃过且非新晋用户)
#when date(in_time)>= 7_date and not in 新晋用户
select distinct
uid as zs_u
from tb_user_log,attr
where date(in_time)>= 7_date
and uid not in(
select
uid
from main,attr
where min_date >= 7_date
)
)#输出:109、108、104
,temp2 as(
#流失用户(近30天未活跃但更早前活跃过)
#when date(in_time) < 30_date and not in 近30天活跃过的用户
select distinct
uid as ls_u
from tb_user_log,attr
where date(in_time)< 30_date
and uid not in(
select
uid
from tb_user_log,attr
where date(in_time) >= 30_date
)
)#输出:101
,temp3 as(
#沉睡用户(近7天未活跃但更早前活跃过)
#when date(in_time) < 7_date and not in 近7天活跃过的用户
select distinct
uid as cs_u
from tb_user_log,attr
where date(in_time) < 7_date
and uid not in(
select
uid
from tb_user_log,attr
where date(in_time) >= 7_date
)
)#输出:103、101
(select
'新晋用户' as user_grade,
count(*) as ratio
from temp
group by user_grade)
union
(select
'忠实用户' as user_grade,
count(*) as ratio
from temp1
group by user_grade)
union
(select
'流失用户' as user_grade,
count(*) as ratio
from temp2
group by user_grade)
union
(select
'沉睡用户' as user_grade,
count(*) as ratio
from temp3
group by user_grade)
方法二:
with 子句 + case when
with
main as(
#用户第一次活跃的时间,用户最后活跃的时间
select
uid,
min(date(in_time)) as min_date,
max(date(in_time)) as max_date
from tb_user_log
group by uid
),
attr as(
#近7天和近30天,涉及两个不同的活跃条件
#先求出今日是哪一天,往前七天是哪一天,往前30天是哪一天
select
max(date(in_time)) as now_date,
date_sub(max(date(in_time)),interval 6 day) as 7_date,
date_sub(max(date(in_time)),interval 29 day) as 30_date
from tb_user_log
)
#case when 判断就使每个类型的用户不会有重合。
select
(case
when min_date >= 7_date
then '新晋用户'
when max_date >= 7_date
then '忠实用户'
when max_date <= 30_date
then '流失用户'
else '沉睡用户'
end) as user_grade,
round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratio
from main,attr
group by user_grade
order by ratio desc
方法三:
case when + 日期函数:date_sub()
#case when 判断就使每个类型的用户不会有重合。
select
(case
when min_date >= 7_date
then '新晋用户'
when max_date >= 7_date
then '忠实用户'
when max_date <= 30_date
then '流失用户'
else '沉睡用户'
end) as user_grade,
round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratio
from(
#用户第一次活跃的时间,用户最后活跃的时间
select
uid,
min(date(in_time)) as min_date,
max(date(in_time)) as max_date
from tb_user_log
group by uid
) main,
(
#近7天和近30天,涉及两个不同的活跃条件
#先求出今日是哪一天,往前七天是哪一天,往前30天是哪一天
select
max(date(in_time)) as now_date,
date_sub(max(date(in_time)),interval 6 day) as 7_date,
date_sub(max(date(in_time)),interval 29 day) as 30_date
from tb_user_log
) attr
group by user_grade
order by ratio desc
方法四:
case when + 日期函数: timestampdiff(day,expr1,expr2)
#case when 判断就使每个类型的用户不会有重合。
select
(case
when timestampdiff(day,min_date,'2021-11-04') <= 6
then '新晋用户'
when timestampdiff(day,max_date,'2021-11-04') <= 6
then '忠实用户'
when timestampdiff(day,max_date,'2021-11-04') >= 30
then '流失用户'
else '沉睡用户'
end) as user_grade,
round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratio
from(
#用户第一次活跃的时间,用户最后活跃的时间
select
uid,
min(date(in_time)) as min_date,
max(date(in_time)) as max_date
from tb_user_log
group by uid
) main
group by user_grade
order by ratio desc
方法五:
日期函数:datediff(expr1,expr2) <=6
#case when 判断就使每个类型的用户不会有重合。
select
(case
when datediff('2021-11-04',min_date) <= 6
then '新晋用户'
when datediff('2021-11-04',max_date) <= 6
then '忠实用户'
when datediff('2021-11-04',max_date) >= 30
then '流失用户'
else '沉睡用户'
end) as user_grade,
round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratio
from(
#用户第一次活跃的时间,用户最后活跃的时间
select
uid,
min(date(in_time)) as min_date,
max(date(in_time)) as max_date
from tb_user_log
group by uid
) main
group by user_grade
order by ratio desc