有以下几张数据表,请写出Hive SQL语句,实现以下需求。
注:分区字段为dt,代表日期。

1、某次经营活动中,商家发起了"异性拼团购",试着针对某个地区的用户进行推广,找出匹配用户。
图片说明
图片说明
参考实现:选出城市在北京,性别为男的10个用户名

select user_name
from user_info
where city='beijing' and sex='male'
limit 10;

2、某天,发现食物类的商品卖的很好,你能找出几个资深吃货吗?
参考实现:选出在2019年6月18日,购买的商品类是food的用户名、购买数量、支付金额,并按照购买数量、支付金额倒序排序,取前10个用户。
select user_name, piece, pay_amount
from user_trade
where dt='2019-06-18' and goods_category='food'
order by pay_amount desc, piece desc
limit 10;

3、试着对本公司2019年第一季度商品的热度与价值度进行分析。
参考实现:2019年1月到3月,每个品类有多少人购买,累计金额是多少。

select goods_category,
count(distinct user_name) as user_num,
sum(pay_amount) as total_amount
from user_trade
where dt between '2019-01-01' and '2019-03-31'
group by goods_category;

4、2019年4月,支付金额超过5万元的用户,给VIP用户赠送优惠劵。

参考实现:2019年4月份,统计每位用户的支付金额,筛选出超过5万元的。

select user_name,
sum(pay_amount) as total_amount
from user_trade
where dt between '2019-04-01' and '2019-04-30'
group by user_name
having sum(pay_amount)>50000;

5、去年的劳动节新用户推广活动价值分析,即拉新分析。

参考实现:统计2019年5月1日之后,每日激活用户的数量。

select sum(user_name) as user_num,
datediff(to_date(firstactivetime), '2019-05-01')
from user_info
where to_date(firstactivetime) between '2019-05-01' and '2019-05-31'
group by to_date(firstactivetime);

6、对用户的年龄段进行分析,观察分布情况。

参考实现:统计以下四个年龄段:20岁以下、20-30岁、30-40岁、40岁以上的用户数。

select case when age<20 then '20岁以下'
when age>=20 and age<30 then '20-30岁'
when age>=30 and age<40 then '30-40岁'
else '40岁以上' end as age_type
count(distinct user_name) as user_num
from user_info
group by case when age<20 then '20岁以下'
when age>=20 and age<30 then '20-30岁'
when age>=30 and age<40 then '30-40岁'
else '40岁以上' end;

7、去年王思聪的微博抽奖活动引起争议,我们想要观察用户等级随性别的分布情况。

参考实现:统计每个性别用户等级高低分布情况(level大于5为高级)

select sex.
if(level>5, '高', '低') as level_type,
count(distinct user_name) as user_num
from user_info
group by sex,
if(level>5, '高', '低');

8、分析每个月的拉新情况,可以倒推回运营效果。

参考实现:统计每个月激活用户的数量

select substr(firstactivetime, 1, 7) as active_month,
count(distinct user_name) as user_num
from user_info
group by substr(firstactivetime, 1, 7);

9、找出不同手机品牌的用户分布情况。

参考实现:按照手机品牌分组,统计每个品牌的用户数量。

select extra2['phonebrand'] as phone_brand,
count(distinct user_name) as user_num
from user_info
group by extra2['phonebrand'];

10、找出在2018年具有VIP潜质的用户,发送VIP试用劵。

参考实现:2018年购买的商品品类在五个以上的用户

select user_name,
count(distinct goods_category) as category_num
from user_trade
where year(dt)='2018'
group by user_name
having count(distinct goods_category)>5;

11、激活天数距今超过300天的男女分布情况
select sex,
count(distinct user_name) as user_num
from user_info
where datediff(current_date(),to_date(firstactivetime))>300
group by sex;

12、不同性别、教育程度的分布情况
select sex,
extra2["education"] as education,
count(distinct user_name) as user_num
from user_info
group by sex,
extra2["education"];

13、2019年1月1日到2019年4月30日,每个时段的不同品类购买金额分布。

select substr(from_unixtime(pay_time, 'yyyy-MM-dd HH'), 12),
goods_category,
sum(pay_amount) as total_amount
from user_trade
where dt between '2019-01-01' and '2019-04-30'
group by
substr(from_unixtime(pay_time, 'yyyy-MM-dd HH'), 12),
goods_category;


Hive SQL数据分析实战(二)
有以下几张数据表,请写出Hive SQL语句,实现以下需求。

注:分区字段为dt,代表日期。

1、某年度对用户满意度进行调研分析,找出目标人群。
参考实现:找出2019年购买商品后又退款的用户
select a.user_name
from
(select distinct user_name
from user_trade
where yaer(dt)=2019) a
join
(select distinct user_name
from user_refund
where yaer(dt)=2019) b
on a.user_name=b.user_name;

2、用户忠诚度类项目的调研分析,找出目标人群。

参考实现:选出在2017年和2018年都购买商品的用户

select a.user_name
from
(select distinct user_name
from user_trade
where yaer(dt)=2017) a
join
(select distinct user_name
from user_trade
where yaer(dt)=2018) b
on a.user_name=b.user_name;

3、高忠诚度用户的匹配分析,用以生成心路历程类报表推送给用户,找出目标人群。

参考实现:选出在2017年、2018年和2019年都购买商品的用户

select a.user_name
from
(select distinct user_name
from trade_2017) a
join
(select distinct user_name
from trade_2018) b on a.user_name=b.user_name
join
(select distinct user_name
from trade_2019) c on b.user_name=c.user_name;

4、无退款服务类用户的定位分析,用以发送服务判断类用户调研。

参考实现:在2019年购买商品,但是没有退款的用户。

select a.user_name
from
(select distinct user_name
from user_trade
where yaer(dt)=2019) a
left join
(select distinct user_name
from user_refund
where yaer(dt)=2019) b
on a.user_name=b.user_name
where b.user_name is null;

5、对客户的学历进行调研分析,观察其分布情况。

参考实现:在2019年购买商品的用户的学历分布

select b.education,
count(distinct a.user_name)
from
(select distinct user_name
from user_trade
where yaer(dt)=2019) a
left join
(select distinct user_name,
extra2['education'] as education
from user_info
where yaer(dt)=2019) b
on a.user_name=b.user_name
group by b.education;

6、老客户召回计划,匹配到目标人群。

参考实现:在2017和2018年都购买,但是没有在2019年购买的用户

select a.user_name
from
(select distinct user_name
from trade_2017) a
join
(select distinct user_name
from trade_2018) b on a.user_name=b.user_name
left join
(select distinct user_name
from trade_2019) c on b.user_name=c.user_name
where c.user_name is null;

7、对近几年的交易进行分析,评估平台的价值。

参考实现:2017-2019年有交易的所有用户数

select count(distinct a.user_name),count(a.user_name)
from
(select user_name
from trade_2017
union all
select user_name
from trade_2018
union all
select user_name
from trade_2019) a;

8、对某年度的客户交易价值进行分析。

参考实现:2019年每个用户的支付和退款金额汇总

select coalesce(a.user_name,b.user_name),
if(a.pay_amount if null, 0, a.pay_amount),
if(b.refund_amount if null, 0, b.refund_amount)
from
(select user_name,
sum(pay_amount) as pay_amount
from user_trade
where year(dt)=2019
group by user_name) a
full join
(select user_name,
sum(refund_amount) as refund_amount
from user_refund
where year(dt)=2019
group by user_name) b
on a.user_name=b.user_name

9、对沉默用户的年龄段进行分析,用以部署活动来刺激其消费。

参考实现:首次激活时间在2017年,但是一直没有支付的用户的年龄段分布。

select a.age_type,
count(a.user_name)
from
(select user_name,
case when age<20 then '20岁以下'
when age>=20 and age<30 then '20-30岁'
when age>=30 and age<40 then '30-40岁'
else '40岁以上' end as age_type
from user_info
where year(firstactivetime)=2017) a
left join
(select distinct user_name
from user_trade
where dt>'0') b
where b.user_name is null
group by a.age_type;

10、对活跃用户的激活时间段进行分析,用以部署活动来刺激其消费。

参考实现:2018、2019年都有交易的用户,其激活时间段分布。

select hour(firstactivetime),
count(a.user_name)
from
(select user_name
from trade_2018
unoin
select user_name
from trade_2019)a
left join
user_info b
on a.user_name=b.user_name
group by hour(firstactivetime);

11、在2019年购买后又退款的用户性别分布

select c.sex,
count(distinct c.user_name) as user_num
from
(select distinct user_name from user_trade where year(dt)=2019) a
join
(select distinct user_name from user_refund where year(dt)=2019) b
on a.user_name=b.user_name
join
(select user_name, sex from user_info ) c
on a.user_name=c.user_name
group by c.sex;

12、在2018年购买,但是没在2019年购买的用户城市分布。

select c.city,
count(distinct c.user_name) as user_num
from
(select distinct user_name from trade_2018 ) a
left join
(select distinct user_name from trade_2019) b
on a.user_name=b.user_name
join
(select user_name, city from user_info ) c
on a.user_name=c.user_name
where b.user_name is null
group by c.city;

13、2017-2019年,有交易但是没退款的用户的手机品牌分布。

select c.phonebrand ,
count(distinct c.user_name) as user_num
from
(select distinct user_name from user_trade where year(dt) in (2017, 2018, 2019)) a
left join
(select distinct user_name from user_refund where year(dt) in (2017, 2018, 2019)) b
on a.user_name=b.user_name
join
(select user_name, extra2["phonebrand"] as phonebrand from user_info ) c
on a.user_name=c.user_name
where b.user_name is null
group by c.phonebrand;

原文链接:https://blog.csdn.net/weixin_42384784/article/details/105904206

Hive SQL数据分析实战(三)

有以下几张数据表,请写出Hive SQL语句,实现以下需求。
注:分区字段为dt,代表日期。

1、对2018年公司的支付总额按月度累计进行分析

select a.month,
a.pay_amount,
sum(a.pay_amount) over(order by a.month)
from
(select month(dt) month,
sum(pay_amount) as pay_amount
from user_trade
where year(dt)=2018
group by month(dt)) a;

2、对2017年和2018年公司的支付总额按月度累计进行分析,按年度进行汇总。

select a.year,
a.month,
a.pay_amount,
sum(a.pay_amount) over(partition by a.year order by a.month)
from
(select year(dt) year,
month(dt) month,
sum(pay_amount) as pay_amount
from user_trade
where year(dt) in (2017, 2018)
group by year(dt),
month(dt)) a;

3、对2018年每个月的近三个月进行移动地求平均支付金额

select a.month,
a.pay_amount,
avg(a.pay_amount) over(order by a.month rows between 2 preceding and current row)
from
(select month(dt) month,
sum(pay_amount) as pay_amount
from user_trade
where year(dt)=2018
group by month(dt)) a;

4、对2019年1月份用户的购买爱好进行分析

参考实现:2019年1月,用户购买商品品类数量的排名

select user_name,
count(distinct goods_category) as total_num),
rank() over(order by count(distinct goods_category)) as rank
from user_trade
where substr(dt, 1, 7)='2019-01'
group by user_name;

5、选出2019年支付金额排名在第10、20、30名的用户

select a.user_name,
a.pay_amount,
a.rank
from
(select user_name,
sum(pay_amount) as pay_amount),
dense_rank() over(order by sum(pay_amount) desc) as rank
where year(dt)=2019
group by user_name) a
where a.rank in (10, 20, 30);

6、将2019年1月的支付用户,按照支付金额分成5组。

select user_name,
sum(pay_amount) as pay_amount,
ntile(5) over(order by sum(pay_amount) desc) as level
from user_trade
where substr(dt, 1, 7)='2019-01'
group by user_name;

7、选出2019年退款金额排名前10%的用户

select a.user_name,
a.refund_amount,
a.level
from
(select user_name,
ntile(10) over(order by sum(refund_amount) desc) as level
from user_refund
where year(dt)=2019
group by user_name) a
where a.level=1;

8、支付时间间隔超过100天的用户数(跨时间潜在VIP用户流失分析)

select count(distint user_name)
from
(select user_name,
dt,
lead(dt) over(partition by user_name order by dt) as lead_dt
from user_trade
where dt>'0') a
where a.lead_dr is not null and datediff(a.lead_dt, a.dt)>100;

9、每个城市,不同性别,2018年支付金额最高的TOP3用户。

select c.user_name,
c.city,
c.sex,
c.pay_amount,
c.rank
from
(select a.user_name,
b.city,
b.sex,
a.pay_amount,
row_number() over(partition by b.city,b.sex order by a.pay_amount desc) as rank
from
(select user_name,
sum(pay_amount) as pay_amount,
from user_trade
where year(dt)=2018
group by user_name) a
left join
user_info b
on a.user_name=b.user_name) c
where c.rank<=3;

19
10、每个手机品牌退款金额前25%的用户(跨手机品牌用户退款分析)

select *
from
(select a.user_name,
extra2['phonebrand'] as phonebrand,
a.refund_amount,
nitle(4) over(partition by extra2['phonebrand'] order by a.refund_amount desc) as level
from
(select user_name,
sum(refund_amount) as refund_amount
from user_refund
where dt>'0'
group by user_name) a
left join
user_info b
on a.user_name=b.user_name) c
where c.level=1;

11、计算出每12个月的用户累计支付金额

SELECT a.month,
a.pay_amount,
sum(a.pay_amount) over(order by a.month rows between 11 preceding and current row)
FROM
(SELECT substr(dt,1,7) as month,
sum(pay_amount) as pay_amount
FROM user_trade
WHERE dt>'0'
GROUP BY substr(dt,1,7)) a;

12、计算出每4个月的最大退款金额

SELECT a.month,
max(a.refund_amount) over(order by a.month rows between 3 preceding and current row)
FROM
(SELECT substr(dt,1,7) as month,
sum(refund_amount) as refund_amount
FROM user_refund
WHERE dt>'0'
GROUP BY substr(dt,1,7)) a;

13、退款时间间隔最长的用户

select a.user_name
from
(select user_name,
refund_time,
lead(refund_time) over(partition by user_name order by refund_time) as lead_time
from user_refund
where dt > '0') a
where a.lead_time is not null
order by datediff(a.lead_time, a.refund_time) desc limit 1;

14、2017年和2018年按月累计去重的购买用户数

select b.year,
b.month,
sum(b.user_num) over(partition by b.year order by b.month)
from
(select a.year,
a.month,
count(distinct a.user_name) as user_num
from
(select year(dt) as year,
user_name,
min(month(dt)) as month
from user_trade
where year(dt) in (2017, 2018)
group by year(dt),
user_name) a
group by a.year,
a.month) b
order by b.year,
b.month
limit 24;