WITH datediff AS (
SELECT user_id, fdate,
fdate - LAG(fdate) OVER (PARTITION BY user_id ORDER BY fdate) AS diff
FROM tb_dau
),
consecutive AS (
SELECT user_id, fdate,
SUM(CASE WHEN diff = 1 THEN 0 ELSE 1 END) OVER (PARTITION BY user_id ORDER BY fdate) AS group_id
FROM datediff
),
grouped AS (
SELECT user_id, group_id,
MIN(fdate) AS start_date,
MAX(fdate) AS end_date
FROM consecutive
GROUP BY user_id, group_id
)
SELECT user_id,
MAX(DATEDIFF(end_date, start_date) + 1) AS max_consec_days
FROM grouped
GROUP BY user_id;
This SQL query efficiently computes the longest streak of consecutive login days for each user in the month of January 2023, based on the tb_dau table. Let’s break it down step by step to understand how the query works.
1. datediff CTE (Common Table Expression)
sql复制代码WITH datediff AS (
SELECT user_id, fdate,
fdate - LAG(fdate) OVER (PARTITION BY user_id ORDER BY fdate) AS diff
FROM tb_dau
)
- Purpose: This part of the query calculates the difference between each login date and the previous login date for each user.
- How it works:LAG(fdate) OVER (PARTITION BY user_id ORDER BY fdate) is a window function that looks at the previous login date for each user (partitioned by user_id and ordered by fdate).fdate - LAG(fdate) computes the difference in days between the current fdate and the previous fdate. If the current login date and the previous one are consecutive, the difference will be 1.The result of this subtraction is aliased as diff, which indicates how many days are between two consecutive logins.
- Example:Suppose we have the following records for user_id 10000:100002023-01-01NULL100002023-01-021100002023-01-042For the first row, diff is NULL because there is no previous record.For the second row, diff is 1 because 2023-01-02 is exactly one day after 2023-01-01.For the third row, diff is 2 because there is a 2-day gap between 2023-01-02 and 2023-01-04.
2. consecutive CTE
sql复制代码consecutive AS (
SELECT user_id, fdate,
SUM(CASE WHEN diff = 1 THEN 0 ELSE 1 END) OVER (PARTITION BY user_id ORDER BY fdate) AS group_id
FROM datediff
)
- Purpose: This part groups consecutive login days into the same group. It detects breaks between consecutive logins, so a non-consecutive login will start a new group.
- How it works:CASE WHEN diff = 1 THEN 0 ELSE 1 END creates a condition that returns 0 if the diff is 1 (meaning consecutive days) and 1 otherwise (meaning a break between logins). This marks where the user is not logging in consecutively.SUM(...) OVER (PARTITION BY user_id ORDER BY fdate) calculates the cumulative sum of the 0s and 1s for each user, ordered by fdate. The idea is that consecutive days will accumulate the same group ID, while a break in consecutive days (i.e., diff != 1) will increment the group ID.The result is the group_id, which groups consecutive login dates together.
- Example:For user_id 10000, after running this part, the table could look like:100002023-01-01NULL1100002023-01-0211100002023-01-0422The group group_id = 1 contains 2023-01-01 and 2023-01-02 (because they are consecutive).The group group_id = 2 contains 2023-01-04 (because it’s not consecutive with 2023-01-02).
3. grouped CTE
sql复制代码grouped AS (
SELECT user_id, group_id,
MIN(fdate) AS start_date,
MAX(fdate) AS end_date
FROM consecutive
GROUP BY user_id, group_id
)
- Purpose: This step aggregates the consecutive login days for each user and group, calculating the start_date (the first day of the consecutive streak) and end_date (the last day of the consecutive streak) for each group.
- How it works:For each user_id and group_id, it calculates:MIN(fdate) AS start_date: the earliest date in that group.MAX(fdate) AS end_date: the latest date in that group.This effectively gives us the start and end dates for each streak of consecutive days.
- Example:Continuing with user_id 10000:1000012023-01-012023-01-021000022023-01-042023-01-04The first group is from 2023-01-01 to 2023-01-02.The second group is just 2023-01-04.
4. Final Query
sql复制代码SELECT user_id,
MAX(DATEDIFF(end_date, start_date) + 1) AS max_consec_days
FROM grouped
GROUP BY user_id;
- Purpose: This final part of the query calculates the longest consecutive login streak for each user.
- How it works:DATEDIFF(end_date, start_date) + 1 calculates the number of consecutive days in each group. The +1 accounts for the fact that we’re working with the start and end dates.MAX(...) is used because each user may have multiple consecutive streaks (e.g., group_id = 1 and group_id = 2 for user_id = 10000), so we take the maximum streak for each user.Finally, GROUP BY user_id groups the results by user_id, so that we get one row per user with their longest streak.
- Example:For user_id 10000:100002The longest consecutive login streak for user_id = 10000 is 2 days (from 2023-01-01 to 2023-01-02).
Conclusion
- This query effectively computes the longest consecutive login streak for each user by:Identifying consecutive login dates using the difference between consecutive dates (diff).Grouping consecutive dates together using the cumulative sum (group_id).Aggregating the streaks to get the start and end dates for each group.Calculating the longest streak for each user.
This approach efficiently handles gaps between login dates and correctly computes the longest sequence of consecutive days a user logged in.

京公网安备 11010502036488号