with t1 as ( select month(p.fdate) as month, p.song_id as song_id, count(p.song_id) as play_pv from play_log p join user_info u on p.user_id = u.user_id where year(p.fdate) ='2022' and u.age >= 18 and u.age <= 25 group by month(p.fdate), p.song_id), t2 as ( select t1.month as month, row_number() over (partition by t1.month order by t1.play_pv desc, t1.song_id) as ranking, s.song_name as song_name, t1.play_pv as play_pv from t1 join song_info s on t1.song_id = s.song_id where s.singer_name = '周杰伦' ) select month,ranking,song_name,play_pv from t2 where ranking in (1,2,3)
This SQL query is designed to find the top 3 songs played by users aged 18 to 25 in January and February 2022 based on the play count. Specifically, it looks at songs by the artist "周杰伦" (Jay Chou), as indicated in the song_info
table. Below is a step-by-step explanation of the code:
1. The t1
CTE (Common Table Expression)
WITH t1 AS ( SELECT month(p.fdate) AS month, p.song_id AS song_id, count(p.song_id) AS play_pv FROM play_log p JOIN user_info u ON p.user_id = u.user_id WHERE year(p.fdate) = '2022' AND u.age >= 18 AND u.age <= 25 GROUP BY month(p.fdate), p.song_id )
Purpose of t1
:
- This CTE retrieves the play count of each song by users aged between 18 and 25 in 2022.
- Breakdown:month(p.fdate): Extracts the month from the fdate (play date) column. This allows us to group the plays by month.count(p.song_id): Counts how many times each song_id was played by each user in a given month.JOIN user_info u ON p.user_id = u.user_id: Joins the play_log table with the user_info table to ensure the users' ages are considered.WHERE: Filters for data where:The fdate is in 2022.The user's age is between 18 and 25 years.GROUP BY: Groups the results by month(p.fdate) (for monthly analysis) and p.song_id (to count plays for each song).
Example Output of t1
:For each song, it calculates how many times that song was played by users in the age group 18-25 in each month:
1 | 0 | 4 |
1 | 1 | 4 |
1 | 3 | 2 |
1 | 2 | 1 |
2 | 0 | 2 |
2 | 1 | 1 |
2 | 3 | 1 |
2. The t2
CTE
t2 AS ( SELECT t1.month AS month, row_number() OVER (PARTITION BY t1.month ORDER BY t1.play_pv DESC, t1.song_id) AS ranking, s.song_name AS song_name, t1.play_pv AS play_pv FROM t1 JOIN song_info s ON t1.song_id = s.song_id WHERE s.singer_name = '周杰伦' )
Purpose of t2
:
- This CTE processes the results from
t1
and assigns a ranking to the songs based on their play counts (play_pv
) in each month. - Breakdown:ROW_NUMBER() OVER (PARTITION BY t1.month ORDER BY t1.play_pv DESC, t1.song_id): This window function ranks the songs within each month.PARTITION BY t1.month: This means the ranking starts over for each month.ORDER BY t1.play_pv DESC, t1.song_id: It ranks the songs first by the number of plays (play_pv) in descending order. If two songs have the same play count, the song_id is used to break the tie (this ensures a deterministic ranking order).JOIN song_info s ON t1.song_id = s.song_id: Joins the t1 result set with the song_info table to retrieve the song names.WHERE s.singer_name = '周杰伦': Filters to only include songs by Jay Chou (周杰伦).
Example Output of t2
:For each month and each song, this will include the song's play count (play_pv
), its name, and the rank:
1 | 1 | 明明就 | 4 |
1 | 2 | 说好的幸福呢 | 4 |
1 | 3 | 大笨钟 | 2 |
2 | 1 | 明明就 | 2 |
2 | 2 | 说好的幸福呢 | 1 |
2 | 3 | 大笨钟 | 1 |
3. Final SELECT Query
SELECT month, ranking, song_name, play_pv FROM t2 WHERE ranking IN (1, 2, 3)
Purpose:
- This final step retrieves the top 3 songs for each month based on their ranking.
- WHERE ranking IN (1, 2, 3): Filters the results to only include the top 3 ranked songs for each month.
Example Final Output:
1 | 1 | 明明就 | 4 |
1 | 2 | 说好的幸福呢 | 4 |
1 | 3 | 大笨钟 | 2 |
2 | 1 | 明明就 | 2 |
2 | 2 | 说好的幸福呢 | 1 |
2 | 3 | 大笨钟 | 1 |
Explanation of Key Concepts Used:
- ROW_NUMBER() Window Function:This assigns a unique ranking to each song within each month, based on its play count (play_pv), in descending order. If two songs have the same number of plays, the song_id is used to break the tie.
- PARTITION BY:In the ROW_NUMBER() function, the PARTITION BY clause ensures that the ranking starts anew for each month. Without it, the ranking would continue across all months.
- JOIN:The JOIN operations are used to combine data from the play_log, user_info, and song_info tables. This is necessary to:Filter for users aged 18-25.Ensure that we only look at songs by the artist "周杰伦".Retrieve song names.
- GROUP BY:In the t1 CTE, the GROUP BY clause is used to group the play logs by month and song_id. This allows the counting of plays for each song in each month.
- WHERE Clause:The final WHERE clause (WHERE ranking IN (1, 2, 3)) ensures that only the top 3 songs by play count are returned for each month.