题目:
请统计每个6/7级用户总活跃月份数、2021年活跃天数、2021年试卷作答活跃天数、2021年答题活跃天数,按照总活跃月份数、2021年活跃天数降序排序
坑点
1)总活跃月份数: 全部月份都算,无论是2021年8月还是2020年8月,都算
2)试卷活跃天数:只要试卷的start_time存在,即视为用户在该天活跃
3)答题活跃天数:只要题目的submit_time存在,即视为用户在该天活跃
4)2021年活跃天数:
-
这是最大的难点,它不是“试卷活跃天数”与”答题活跃天数“的简单相加!!当在同一天(e.g. 2021年8月1日,同一个用户在该天即做过试卷(exam)也做过题目(practice),则2021年的活跃天数仅仅+1,而非+2;
-
基于这一点,UNION / UNION ALL是解题的关键步骤之一,该关键词 可使得做试卷和做题目的日期能处于同一列下进行去重操作
解题思路
步骤1:分别找出每个6/7级的用户在试卷(exam)和题目(practice)上活跃过的日期,并使用UNION对两个结果进行合并
- 每个6/7级用户的试卷活跃天数是:
SELECT ui.uid AS uid, /* 用户id */
exam_id AS answer_id, /* 对应的试卷id */
DATE_FORMAT(er.start_time, '%Y-%m') AS month, /* 活跃时间的年月,用于统计总活跃月份数 */
DATE(er.start_time) AS day /* 活跃时间的年月日,用于找出21年的试卷活跃天数 */
FROM user_info AS ui LEFT OUTER JOIN exam_record AS er
ON ui.uid = er.uid
WHERE ui.level IN (6,7) /* 级别为6或7的用户 */
- 同样地,每个6/7级用户的题目活跃天数是:
SELECT ui.uid AS uid,
question_id AS answer_id, /* 对应的题目id */
DATE_FORMAT(pr.submit_time, '%Y-%m') AS month,
DATE(pr.submit_time) AS day /* 活跃时间的年月日,用于找出21年的题目练习活跃天数 */
FROM user_info AS ui LEFT OUTER JOIN practice_record AS pr ON ui.uid = pr.uid
WHERE ui.level IN (6,7)
- 使用UNION对两个表进行合并即可
SELECT ui.uid AS uid,
exam_id AS answer_id,
DATE_FORMAT(er.start_time, '%Y-%m') AS month,
DATE(er.start_time) AS day
FROM user_info AS ui LEFT OUTER JOIN exam_record AS er
ON ui.uid = er.uid
WHERE ui.level IN (6,7)
UNION
SELECT ui.uid AS uid,
question_id AS answer_id,
DATE_FORMAT(pr.submit_time, '%Y-%m') AS month,
DATE(pr.submit_time) AS day
FROM user_info AS ui LEFT OUTER JOIN practice_record AS pr ON ui.uid = pr.uid
WHERE ui.level IN (6,7)
- 根据用例数据,步骤1会得出以下结果:
uid | answer_id | Month | Date |
---|---|---|---|
1001 | 9001 | 44440 | 44440 |
1002 | 9001 | 44075 | 44075 |
1003 | NULL | NULL | NULL |
1005 | 9001 | 44440 | 44440 |
1005 | 9002 | 44440 | 44440 |
1006 | 9001 | 44440 | 44446 |
1006 | 9003 | 44440 | 44446 |
1001 | 8002 | 44409 | 44410 |
1001 | 8001 | 44409 | 44410 |
1002 | NULL | NULL | NULL |
1005 | NULL | NULL | NULL |
1006 | 8003 | 44044 | 44044 |
1006 | 8003 | 44409 | 44410 |
1006 | 8003 | 44409 | 44411 |
1006 | 8002 | 44409 | 44412 |
步骤2:计算 总活跃月份数、2021年活跃天数、2021年试卷作答活跃天数、2021年答题活跃天数
- 总活跃月数:直接统计“唯一的月份”的个数即可
- 2021年活跃天数:需要用到CASE WHEN,当具体日期在2021年时才会被统计,并且是统计“唯一的日期”的个数。不过这里稍微有点难于理解的是,需要DISTINCT CASE WHEN,说下我个人的看法:CASE WHEN实质还是创建一个新的列/字段(但不是act_days_2021),因此DISTINCT CASE WHEN其实就是在对一个字段进行去重而已
SELECT t1.uid,
COUNT(DISTINCT t1.month) AS act_month_total,
COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021 THEN t1.day ELSE NULL END) AS act_days_2021
FROM
(
SELECT ui.uid AS uid, exam_id AS answer_id, DATE_FORMAT(er.start_time, '%Y-%m') AS month, DATE(er.start_time) AS day
FROM user_info AS ui LEFT OUTER JOIN exam_record AS er ON ui.uid = er.uid
WHERE ui.level IN (6,7)
UNION
SELECT ui.uid AS uid, question_id AS answer_id, DATE_FORMAT(pr.submit_time, '%Y-%m') AS month, DATE(pr.submit_time) AS day
FROM user_info AS ui LEFT OUTER JOIN practice_record AS pr ON ui.uid = pr.uid
WHERE ui.level IN (6,7)
) AS t1
GROUP BY uid
- 2021年试卷作答活跃天数:这里则需要用到CASE WHEN中的多条件,不难;需要满足两个条件:一是2021年,二是该次活跃必须是属于试卷的活跃
- 2021年答题活跃天数:同上,两个条件:一是2021年,二是该次活跃必须是属于答题的活跃
SELECT t1.uid,
COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021 /* 条件1:2021年 */
AND t1.answer_id IN (SELECT exam_id FROM examination_info) /* 条件2:必须是试卷的活跃 */
THEN t1.day ELSE NULL END) AS act_days_2021_exam,
COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021 /* 条件1:2021年 */
AND t1.answer_id IN (SELECT DISTINCT question_id FROM practice_record) /* 条件2:必须是答题的活跃 */
THEN t1.day ELSE NULL END) AS act_days_2021_question
FROM
(
SELECT ui.uid AS uid, exam_id AS answer_id, DATE_FORMAT(er.start_time, '%Y-%m') AS month, DATE(er.start_time) AS day
FROM user_info AS ui LEFT OUTER JOIN exam_record AS er ON ui.uid = er.uid
WHERE ui.level IN (6,7)
UNION
SELECT ui.uid AS uid, question_id AS answer_id, DATE_FORMAT(pr.submit_time, '%Y-%m') AS month, DATE(pr.submit_time) AS day
FROM user_info AS ui LEFT OUTER JOIN practice_record AS pr ON ui.uid = pr.uid
WHERE ui.level IN (6,7)
) AS t1
GROUP BY uid
- 结合上述两个拆分步骤,可得:
SELECT t1.uid,
COUNT(DISTINCT t1.month) AS act_month_total,
COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021 THEN t1.day ELSE NULL END) AS act_days_2021,
COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021
AND t1.answer_id IN (SELECT exam_id FROM examination_info)
THEN t1.day ELSE NULL END) AS act_days_2021_exam,
COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021
AND t1.answer_id IN (SELECT DISTINCT question_id FROM practice_record)
THEN t1.day ELSE NULL END) AS act_days_2021_question
FROM
(
SELECT ui.uid AS uid, exam_id AS answer_id, DATE_FORMAT(er.start_time, '%Y-%m') AS month, DATE(er.start_time) AS day
FROM user_info AS ui LEFT OUTER JOIN exam_record AS er ON ui.uid = er.uid
WHERE ui.level IN (6,7)
UNION
SELECT ui.uid AS uid, question_id AS answer_id, DATE_FORMAT(pr.submit_time, '%Y-%m') AS month, DATE(pr.submit_time) AS day
FROM user_info AS ui LEFT OUTER JOIN practice_record AS pr ON ui.uid = pr.uid
WHERE ui.level IN (6,7)
) AS t1
GROUP BY uid
ORDER BY act_month_total DESC, act_days_2021 DESC;
/* 最终再按照总活跃月份数、2021年活跃天数降序排序即可 */
第二种解法
- 其实在确认属于是试卷的活跃还是答题的活跃,这一步,也有更好的解法(不需要用到子查询),而是直接在步骤一,就让两个表各自打上“exam”和“question“两个标签,后面也可以省不少功夫了:
SELECT ui.uid AS uid,
"exam" AS tag,
DATE_FORMAT(er.start_time, '%Y-%m') AS month,
DATE(er.start_time) AS day
FROM user_info AS ui LEFT OUTER JOIN exam_record AS er
ON ui.uid = er.uid
WHERE ui.level IN (6,7)
UNION
SELECT ui.uid AS uid,
"question" AS tag,
DATE_FORMAT(pr.submit_time, '%Y-%m') AS month,
DATE(pr.submit_time) AS day
FROM user_info AS ui LEFT OUTER JOIN practice_record AS pr ON ui.uid = pr.uid
WHERE ui.level IN (6,7)
SELECT t1.uid,
COUNT(DISTINCT t1.month) AS act_month_total,
COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021 THEN t1.day ELSE NULL END) AS act_days_2021,
COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021
AND t1.tag = 'exam'
THEN t1.day ELSE NULL END) AS act_days_2021_exam,
COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021
AND t1.tag = 'question'
THEN t1.day ELSE NULL END) AS act_days_2021_question
FROM
(
SELECT ui.uid AS uid,
"exam" AS tag,
DATE_FORMAT(er.start_time, '%Y-%m') AS month,
DATE(er.start_time) AS day
FROM user_info AS ui LEFT OUTER JOIN exam_record AS er
ON ui.uid = er.uid
WHERE ui.level IN (6,7)
UNION
SELECT ui.uid AS uid,
"question" AS tag,
DATE_FORMAT(pr.submit_time, '%Y-%m') AS month,
DATE(pr.submit_time) AS day
FROM user_info AS ui LEFT OUTER JOIN practice_record AS pr ON ui.uid = pr.uid
WHERE ui.level IN (6,7)
) AS t1
GROUP BY uid
ORDER BY act_month_total DESC, act_days_2021 DESC;