题目:

请统计每个6/7级用户总活跃月份数、2021年活跃天数、2021年试卷作答活跃天数、2021年答题活跃天数,按照总活跃月份数、2021年活跃天数降序排序


坑点

1)总活跃月份数: 全部月份都算,无论是2021年8月还是2020年8月,都算

2)试卷活跃天数:只要试卷的start_time存在,即视为用户在该天活跃

3)答题活跃天数:只要题目的submit_time存在,即视为用户在该天活跃

4)2021年活跃天数:

  • 这是最大的难点,它不是“试卷活跃天数”与”答题活跃天数“的简单相加!!当在同一天(e.g. 2021年8月1日,同一个用户在该天即做过试卷(exam)也做过题目(practice),则2021年的活跃天数仅仅+1,而非+2;

  • 基于这一点,UNION / UNION ALL是解题的关键步骤之一,该关键词 可使得做试卷和做题目的日期能处于同一列下进行去重操作


解题思路

步骤1:分别找出每个6/7级的用户在试卷(exam)和题目(practice)上活跃过的日期,并使用UNION对两个结果进行合并

  • 每个6/7级用户的试卷活跃天数是:
SELECT ui.uid AS uid,  /* 用户id */
		exam_id AS answer_id, /* 对应的试卷id */
        DATE_FORMAT(er.start_time, '%Y-%m') AS month, /* 活跃时间的年月,用于统计总活跃月份数 */
        DATE(er.start_time) AS day /* 活跃时间的年月日,用于找出21年的试卷活跃天数 */
FROM user_info AS ui LEFT OUTER JOIN exam_record AS er
ON ui.uid = er.uid
WHERE ui.level IN (6,7) /* 级别为6或7的用户 */
  • 同样地,每个6/7级用户的题目活跃天数是:
SELECT ui.uid AS uid, 
		question_id AS answer_id, /* 对应的题目id */
		DATE_FORMAT(pr.submit_time, '%Y-%m') AS month, 
		DATE(pr.submit_time) AS day /* 活跃时间的年月日,用于找出21年的题目练习活跃天数 */
FROM user_info AS ui LEFT OUTER JOIN practice_record AS pr ON ui.uid = pr.uid
WHERE ui.level IN (6,7)
  • 使用UNION对两个表进行合并即可
SELECT ui.uid AS uid,
		exam_id AS answer_id, 
        DATE_FORMAT(er.start_time, '%Y-%m') AS month, 
        DATE(er.start_time) AS day 
FROM user_info AS ui LEFT OUTER JOIN exam_record AS er
ON ui.uid = er.uid
WHERE ui.level IN (6,7) 
UNION
SELECT ui.uid AS uid, 
		question_id AS answer_id, 
		DATE_FORMAT(pr.submit_time, '%Y-%m') AS month, 
		DATE(pr.submit_time) AS day 
FROM user_info AS ui LEFT OUTER JOIN practice_record AS pr ON ui.uid = pr.uid
WHERE ui.level IN (6,7)
  • 根据用例数据,步骤1会得出以下结果:
uid answer_id Month Date
1001 9001 44440 44440
1002 9001 44075 44075
1003 NULL NULL NULL
1005 9001 44440 44440
1005 9002 44440 44440
1006 9001 44440 44446
1006 9003 44440 44446
1001 8002 44409 44410
1001 8001 44409 44410
1002 NULL NULL NULL
1005 NULL NULL NULL
1006 8003 44044 44044
1006 8003 44409 44410
1006 8003 44409 44411
1006 8002 44409 44412

步骤2:计算 总活跃月份数、2021年活跃天数、2021年试卷作答活跃天数、2021年答题活跃天数

  • 总活跃月数:直接统计“唯一的月份”的个数即可
  • 2021年活跃天数:需要用到CASE WHEN,当具体日期在2021年时才会被统计,并且是统计“唯一的日期”的个数。不过这里稍微有点难于理解的是,需要DISTINCT CASE WHEN,说下我个人的看法:CASE WHEN实质还是创建一个新的列/字段(但不是act_days_2021),因此DISTINCT CASE WHEN其实就是在对一个字段进行去重而已
SELECT t1.uid,
		COUNT(DISTINCT t1.month) AS act_month_total,
		COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021 THEN t1.day ELSE NULL END) AS act_days_2021
FROM 
    (
    SELECT ui.uid AS uid, exam_id AS answer_id, DATE_FORMAT(er.start_time, '%Y-%m') AS month, DATE(er.start_time) AS day
    FROM user_info AS ui LEFT OUTER JOIN exam_record AS er ON ui.uid = er.uid
    WHERE ui.level IN (6,7)
    UNION
    SELECT ui.uid AS uid, question_id AS answer_id, DATE_FORMAT(pr.submit_time, '%Y-%m') AS month, DATE(pr.submit_time) AS day
    FROM user_info AS ui LEFT OUTER JOIN practice_record AS pr ON ui.uid = pr.uid
    WHERE ui.level IN (6,7)
    ) AS t1
GROUP BY uid
  • 2021年试卷作答活跃天数:这里则需要用到CASE WHEN中的多条件,不难;需要满足两个条件:一是2021年,二是该次活跃必须是属于试卷的活跃
  • 2021年答题活跃天数:同上,两个条件:一是2021年,二是该次活跃必须是属于答题的活跃
SELECT t1.uid,
		COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021  /* 条件1:2021年 */
                 AND t1.answer_id IN (SELECT exam_id FROM examination_info)  /* 条件2:必须是试卷的活跃 */
                 THEN t1.day ELSE NULL END) AS act_days_2021_exam,
                 
		COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021  /* 条件1:2021年 */
                 AND t1.answer_id IN (SELECT DISTINCT question_id FROM practice_record)  /* 条件2:必须是答题的活跃 */
                 THEN t1.day ELSE NULL END) AS act_days_2021_question                
FROM
    (
    SELECT ui.uid AS uid, exam_id AS answer_id, DATE_FORMAT(er.start_time, '%Y-%m') AS month, DATE(er.start_time) AS day
    FROM user_info AS ui LEFT OUTER JOIN exam_record AS er ON ui.uid = er.uid
    WHERE ui.level IN (6,7)
    UNION
    SELECT ui.uid AS uid, question_id AS answer_id, DATE_FORMAT(pr.submit_time, '%Y-%m') AS month, DATE(pr.submit_time) AS day
    FROM user_info AS ui LEFT OUTER JOIN practice_record AS pr ON ui.uid = pr.uid
    WHERE ui.level IN (6,7)
    ) AS t1
GROUP BY uid
  • 结合上述两个拆分步骤,可得:
SELECT t1.uid,
		COUNT(DISTINCT t1.month) AS act_month_total,
		COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021 THEN t1.day ELSE NULL END) AS act_days_2021,
		COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021
                 AND t1.answer_id IN (SELECT exam_id FROM examination_info)
                 THEN t1.day ELSE NULL END) AS act_days_2021_exam,
		COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021
                 AND t1.answer_id IN (SELECT DISTINCT question_id FROM practice_record)
                 THEN t1.day ELSE NULL END) AS act_days_2021_question     
FROM 
    (
    SELECT ui.uid AS uid, exam_id AS answer_id, DATE_FORMAT(er.start_time, '%Y-%m') AS month, DATE(er.start_time) AS day
    FROM user_info AS ui LEFT OUTER JOIN exam_record AS er ON ui.uid = er.uid
    WHERE ui.level IN (6,7)
    UNION
    SELECT ui.uid AS uid, question_id AS answer_id, DATE_FORMAT(pr.submit_time, '%Y-%m') AS month, DATE(pr.submit_time) AS day
    FROM user_info AS ui LEFT OUTER JOIN practice_record AS pr ON ui.uid = pr.uid
    WHERE ui.level IN (6,7)
    ) AS t1
GROUP BY uid
ORDER BY act_month_total DESC, act_days_2021 DESC; 
/* 最终再按照总活跃月份数、2021年活跃天数降序排序即可 */

第二种解法

  • 其实在确认属于是试卷的活跃还是答题的活跃,这一步,也有更好的解法(不需要用到子查询),而是直接在步骤一,就让两个表各自打上“exam”和“question“两个标签,后面也可以省不少功夫了:
SELECT ui.uid AS uid,
		"exam" AS tag, 
        DATE_FORMAT(er.start_time, '%Y-%m') AS month, 
        DATE(er.start_time) AS day 
FROM user_info AS ui LEFT OUTER JOIN exam_record AS er
ON ui.uid = er.uid
WHERE ui.level IN (6,7) 
UNION
SELECT ui.uid AS uid, 
		"question" AS tag, 
		DATE_FORMAT(pr.submit_time, '%Y-%m') AS month, 
		DATE(pr.submit_time) AS day 
FROM user_info AS ui LEFT OUTER JOIN practice_record AS pr ON ui.uid = pr.uid
WHERE ui.level IN (6,7)
SELECT t1.uid,
		COUNT(DISTINCT t1.month) AS act_month_total,
		COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021 THEN t1.day ELSE NULL END) AS act_days_2021,
		COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021
                 AND t1.tag = 'exam'
                 THEN t1.day ELSE NULL END) AS act_days_2021_exam,
		COUNT(DISTINCT CASE WHEN YEAR(t1.day) = 2021
                 AND t1.tag = 'question'
                 THEN t1.day ELSE NULL END) AS act_days_2021_question     
FROM 
    (
      SELECT ui.uid AS uid,
              "exam" AS tag, 
              DATE_FORMAT(er.start_time, '%Y-%m') AS month, 
              DATE(er.start_time) AS day 
      FROM user_info AS ui LEFT OUTER JOIN exam_record AS er
      ON ui.uid = er.uid
      WHERE ui.level IN (6,7) 
      UNION
      SELECT ui.uid AS uid, 
              "question" AS tag, 
              DATE_FORMAT(pr.submit_time, '%Y-%m') AS month, 
              DATE(pr.submit_time) AS day 
      FROM user_info AS ui LEFT OUTER JOIN practice_record AS pr ON ui.uid = pr.uid
      WHERE ui.level IN (6,7)
    ) AS t1
GROUP BY uid
ORDER BY act_month_total DESC, act_days_2021 DESC;