1. 确定总体问题
我们需要找出在注册后连续登录不少于3天的用户,并输出他们的用户ID,并按用户ID升序排序。
2. 分析关键问题
- 提取登录日期:从登录信息中提取每个用户的登录日期。
- 编号登录记录:为每个用户的登录日期编号,以便识别连续登录的天数。
- 识别连续登录:通过计算日期差异识别连续登录的天数。
- 过滤和排序:筛选出连续登录不少于3天的用户,并按用户ID升序排序。
3. 解决每个关键问题的代码及讲解
步骤1:提取登录日期
我们使用WITH
子句创建一个临时表login_dates
,提取每个用户的登录日期:
WITH login_dates AS (
SELECT DISTINCT user_id, DATE(log_time) AS login_date
FROM login_tb
WHERE user_id IN (SELECT user_id FROM register_tb)
)
SELECT DISTINCT user_id, DATE(log_time) AS login_date
:提取每个用户的唯一登录日期,DISTINCT
去重,DATE
用于提取日期。WHERE user_id IN (SELECT user_id FROM register_tb)
:只选择注册用户的登录记录。
步骤2:编号登录记录
我们为每个用户的登录日期编号,以便识别连续登录的天数:
numbered_logins AS (
SELECT user_id, login_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) AS rn
FROM login_dates
)
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) AS rn
:为每个用户的登录日期按时间顺序编号。PARTITION BY user_id
按照user_id分区ORDER BY login_date
按照login_date排序
步骤3:识别连续登录
通过计算日期差异识别连续登录的天数:
grouped_logins AS (
SELECT user_id, COUNT(*) AS consecutive_days
FROM numbered_logins
GROUP BY user_id, DATE_SUB(login_date, INTERVAL rn DAY)
)
DATE_SUB(login_date, INTERVAL rn DAY)
:通过减去编号,连续的日期将具有相同的结果,从而可以分组。COUNT(*) AS consecutive_days
:计算每个分组的连续天数。
DATE_SUB处理前后样例如下
2023-01-02 | 1 | → | 2023-01-01 |
2023-01-03 | 2 | → | 2023-01-01 |
2023-01-05 | 3 | → | 2023-01-02 |
步骤4:过滤和排序
筛选出连续登录不少于3天的用户,并按用户ID升序排序:
SELECT user_id
FROM grouped_logins
WHERE consecutive_days >= 3
ORDER BY user_id;
WHERE consecutive_days >= 3
:筛选出连续登录不少于3天的用户。ORDER BY user_id
:按用户ID升序排序。
完整代码
WITH login_dates AS (
SELECT DISTINCT user_id, DATE(log_time) AS login_date
FROM login_tb
WHERE user_id IN (SELECT user_id FROM register_tb)
),
numbered_logins AS (
SELECT user_id, login_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) AS rn
FROM login_dates
),
grouped_logins AS (
SELECT user_id, COUNT(*) AS consecutive_days
FROM numbered_logins
GROUP BY user_id, DATE_SUB(login_date, INTERVAL rn DAY)
)
SELECT user_id
FROM grouped_logins
WHERE consecutive_days >= 3
ORDER BY user_id;