一、知识点总结
把有用知识点写在前面,以方便自个儿收藏观看😊。
1)5个常用的排序函数
- rank() over() 1 2 2 4 4 6 (计数排名,跳过相同的几个,eg.没有3没有5)\
- row_number() over() 1 2 3 4 5 6 (赋予唯一排名)
- dense_rank() over() 1 2 2 3 3 4 (不跳过排名,可以理解为对类别进行计数)
- percent_rank() over() 按照数字所在的位置进行百分位分段
- ntile(n)over() 将数字按照大小平均分成n段
- lead(字段名,n)over()把字段数据向前移n个单元格
- lag(字段名,n)over()把字段数据向后移n个单元格
案例:
select * , row_number()over(order by uid) row_number1, /*按照uid的大小不重不漏1 2 3 4 5 6 7 */ rank()over(order by uid) rank1, /*按照uid的大小并列第一无第二,1 1 1 4 5 5 7*/ dense_rank()over(order by uid) dense_rank1,/*按照uid的大小并列第一有第二,1 1 1 2 3 3 4*/ percent_rank()over(order by uid) percent_rank1,/*按照uid的大小进行百分法排序*/ ntile(2)over(order by uid) ntile1,/*按照uid的大小,把uid评价分成2组*/ lead(uid)over(order by uid) lead1,/*把uid向上推1个位置*/ lag(uid)over(order by uid) lag1 /*把uid向下推1个位置*/ from user_id;
二、题目解读与解题步骤拆分
1、题目解读
- 求:请统计SQL试卷上未完成率较高的50%用户中,6级和7级用户在有试卷作答记录的近三个月中,每个月的答卷数目和完成数目。按用户ID、月份升序排序。
- 题目中的坑:未完成率较高的50%用户,即为完成率低于50%的用户。
- 需求字段:uid 、 start_month、 total_cnt complete_cnt
2、步骤拆分
1)SQL试卷上未完成率较高的50%用户。
- 完成率=用户完成的题/用户答题数:COUNT(submit_time) / COUNT(start_time)
- 对完成率进行排序:PERCENT_RANK()OVER()
- SQL试卷
2)6级和7级用户在有试卷作答记录的近三个月中。
- 查找6,7级用户的uid
- 对作答记录进行排序取近3个月
3)每个月的答卷数目和完成数目。
4)按用户ID、月份升序排序。
三、步骤代码
1)SQL完成率=用户完成的题/用户答题数- 用户完成题数:COUNT(submit_time)
- 用户答题数:COUNT(start_time)
- 完成率=COUNT(submit_time) / COUNT(start_time)
- SELECT exam_id FROM examination_info WHERE tag='SQL'
2)对完成率进行排序,以及完成率排名后50%的用户
- PERCENT_RANK()OVER( ORDER BY count(submit_time)/count(start_time) ) rate_rk
- rate_rk<=50%
SELECT uid,COUNT(submit_time)/COUNT(start_time) rk_com ,PERCENT_RANK()over( ORDER BY COUNT(submit_time)/COUNT(start_time) ) rate_rk FROM exam_record WHERE exam_id IN (SELECT exam_id FROM examination_info WHERE tag='SQL') GROUP BY uid /*对SQL用户的完成率进行排序*/
3)查找6,7级用户的uid
- SELECT uid FROM user_info WHERE level IN (6,7);
SELECT * FROM ( SELECT uid ,PERCENT_RANK()over( ORDER BY count(submit_time)/count(start_time) ) rate_rk FROM exam_record WHERE exam_id IN (SELECT exam_id FROM examination_info WHERE tag='SQL') GROUP BY uid ) A -- 对用户完成率进行排序 WHERE rate_rk<=0.5 -- 查找完成率排名低的50%用户 AND uid IN (SELECT uid FROM user_info WHERE level IN (6,7))-- 查找6,7级用户
以上找出来满足题设需求的用户”SQL试卷上未完成率较高的50%用户中,6级和7级用户“只有1002
4)对作答记录时间进行排序取近3个月
SELECT * ,dense_rank()over(partition by uid order by date_format(start_time,'%Y-%m') desc) time_rk -- 对作答时间进行排序 FROM exam_record
5)每个用户每个月的答卷数目和完成数目,并按用户ID、月份升序排序。
SELECT t1.uid, DATE_FORMAT(start_time,'%Y%m')start_month, -- 取月份数 COUNT(start_time) total_cnt, -- 答卷数 COUNT(submit_time) complete_cnt -- 完成数 FROM ( SELECT * , dense_rank()over(partition by uid order by date_format(start_time,'%Y-%m') desc) time_rk -- 对作答时间进行排序 FROM exam_record )t1 WHERE time_rk<=3 -- 查找作答时间最近的3个月 GROUP BY uid,start_month ORDER BY uid,start_month -- 按照用户id和月份进行升序排序 ;
四、完整代码组装
SELECT t1.uid, DATE_FORMAT(start_time,'%Y%m')start_month, -- 取月份数 COUNT(start_time) total_cnt, -- 答卷数 COUNT(submit_time) complete_cnt -- 完成数 FROM ( SELECT * , dense_rank()over(partition by uid order by date_format(start_time,'%Y-%m') desc) time_rk -- 对作答时间进行排序 FROM exam_record )t1 RIGHT JOIN ( SELECT * FROM ( SELECT uid, PERCENT_RANK()over( ORDER BY count(submit_time)/count(start_time) ) rate_rk -- 对完成率进行分数排序 FROM exam_record WHERE exam_id IN (SELECT exam_id FROM examination_info WHERE tag='SQL') -- SQL试卷 GROUP BY uid ) A WHERE rate_rk<=0.5 -- 查找排名低于50%的用户 AND uid IN (SELECT uid FROM user_info WHERE level IN (6,7)) -- 查找6.7级用户 )t2 ON t1.uid=t2.uid WHERE time_rk<=3 -- 查找作答时间最近的3个月 GROUP BY uid,start_month ORDER BY uid,start_month -- 按照用户id和月份进行升序排序 ;