问题拆分
1.找到各类试卷tag在2020年与2021年上半年的完成次数
-
- 试卷完成信息出现在exam_record表中,试卷类别出现在examination_info中,因此通过right join或者left join两表形成表t1;
....
exam_record t1
right join
examination_info t2
on t1.exam_id = t2.exam_id
....
-
- 由于需要2020年与2021年上半年的数据,在连接表时,使用where筛选submit_time;
....
where
year(submit_time) in (2021, 2020)
and
month(submit_time)<=6
....
- 3.筛选数据后,使用group by 依次对类别tag和答题时间submit_time进行分组,使用count(*)统计每类试卷每年的答题数量;
....
count(*) exam_cnt,
....
group by
tag,year(submit_time)
....
- 4.使用over()和rank()对每年的各类试卷进行排名,按照试卷完成数倒序排名;
....
rank() over(partition by year(submit_time) order by count(*) desc) exam_cnt_rank
....
形成t1
select
tag,
year(submit_time) start_year,
count(*) exam_cnt,
rank() over(partition by year(submit_time) order by count(*) desc) exam_cnt_rank
from
exam_record t1
right join
examination_info t2
on t1.exam_id = t2.exam_id
where
year(submit_time) in (2021, 2020)
and
month(submit_time)<=6
group by
tag,year(submit_time)
2.统计同比增长率以及排名变化(在t1表中计算即可)
- 主要思路:t1中的每类试卷有2020年以及2021年的完成数量,有两行数据,对tag使用group by分组变成一行数据,按照年份start_year使用if()和sum()嵌套进行计算。
- 注意!!!为保证同类试卷在2020年与2021年都有数据,需要对group by 后的数据使用having进行筛选,满足count(*)=2
eg:计算2020年的试卷完成数量
sum(if(start_year=2020, exam_cnt, 0)) exam_cnt_20
eg:计算同比增长率(2021年完成数量减去2020年完成数量,再除以2020年完成数量)
sum(if(start_year=2020, -exam_cnt, exam_cnt))/sum(if(start_year=2020, exam_cnt, 0))
- 该部分统计代码
sum(if(start_year=2020, exam_cnt, 0)) exam_cnt_20,
sum(if(start_year=2021, exam_cnt, 0)) exam_cnt_21,
concat(round(100*sum(if(start_year=2020, -exam_cnt, exam_cnt))/sum(if(start_year=2020, exam_cnt, 0)),1), '%') growth_rate,
sum(if(start_year=2020, exam_cnt_rank, 0)) exam_cnt_rank_20,
sum(if(start_year=2021, exam_cnt_rank, 0)) exam_cnt_rank_21,
sum(if(start_year=2020, -exam_cnt_rank, exam_cnt_rank)) rank_delta
总体代码
select
tag,
sum(if(start_year=2020, exam_cnt, 0)) exam_cnt_20,
sum(if(start_year=2021, exam_cnt, 0)) exam_cnt_21,
concat(round(100*sum(if(start_year=2020, -exam_cnt, exam_cnt))/sum(if(start_year=2020, exam_cnt, 0)),1), '%') growth_rate,
sum(if(start_year=2020, exam_cnt_rank, 0)) exam_cnt_rank_20,
sum(if(start_year=2021, exam_cnt_rank, 0)) exam_cnt_rank_21,
sum(if(start_year=2020, -exam_cnt_rank, exam_cnt_rank)) rank_delta
from
(select
tag,
year(submit_time) start_year,
count(*) exam_cnt,
rank() over(partition by year(submit_time) order by count(*) desc) exam_cnt_rank
from
exam_record t1
right join
examination_info t2
on t1.exam_id = t2.exam_id
where
year(submit_time) in (2021, 2020)
and
month(submit_time)<=6
group by
tag,year(submit_time)) t1
group by
tag
having
count(*) = 2
order by
growth_rate desc, exam_cnt_rank_21 desc;