with t1 as (
    select
        month(p.fdate) as month,
        p.song_id as song_id,
        count(p.song_id) as play_pv
    from
        play_log p
    join
        user_info u on p.user_id = u.user_id
    where year(p.fdate) ='2022' and u.age >= 18 and u.age <= 25
    group by
        month(p.fdate), p.song_id),
    
    t2 as (
    select
        t1.month as month,
        row_number() over (partition by t1.month order by t1.play_pv desc, t1.song_id) as ranking,
        s.song_name as song_name,
        t1.play_pv as play_pv
    from
        t1
    join
        song_info s
    on
        t1.song_id = s.song_id
    where s.singer_name = '周杰伦'
    )

    select
        month,ranking,song_name,play_pv
    from
        t2
    where ranking in (1,2,3)


This SQL query is designed to find the top 3 songs played by users aged 18 to 25 in January and February 2022 based on the play count. Specifically, it looks at songs by the artist "周杰伦" (Jay Chou), as indicated in the song_info table. Below is a step-by-step explanation of the code:

1. The t1 CTE (Common Table Expression)

WITH t1 AS (
    SELECT
        month(p.fdate) AS month, 
        p.song_id AS song_id, 
        count(p.song_id) AS play_pv
    FROM play_log p
    JOIN user_info u ON p.user_id = u.user_id
    WHERE year(p.fdate) = '2022' AND u.age >= 18 AND u.age <= 25
    GROUP BY month(p.fdate), p.song_id
)

Purpose of t1:

  • This CTE retrieves the play count of each song by users aged between 18 and 25 in 2022.
  • Breakdown:month(p.fdate): Extracts the month from the fdate (play date) column. This allows us to group the plays by month.count(p.song_id): Counts how many times each song_id was played by each user in a given month.JOIN user_info u ON p.user_id = u.user_id: Joins the play_log table with the user_info table to ensure the users' ages are considered.WHERE: Filters for data where:The fdate is in 2022.The user's age is between 18 and 25 years.GROUP BY: Groups the results by month(p.fdate) (for monthly analysis) and p.song_id (to count plays for each song).

Example Output of t1:For each song, it calculates how many times that song was played by users in the age group 18-25 in each month:

1

0

4

1

1

4

1

3

2

1

2

1

2

0

2

2

1

1

2

3

1

2. The t2 CTE

t2 AS (
    SELECT
        t1.month AS month,
        row_number() OVER (PARTITION BY t1.month ORDER BY t1.play_pv DESC, t1.song_id) AS ranking,
        s.song_name AS song_name,
        t1.play_pv AS play_pv
    FROM t1
    JOIN song_info s ON t1.song_id = s.song_id
    WHERE s.singer_name = '周杰伦'
)

Purpose of t2:

  • This CTE processes the results from t1 and assigns a ranking to the songs based on their play counts (play_pv) in each month.
  • Breakdown:ROW_NUMBER() OVER (PARTITION BY t1.month ORDER BY t1.play_pv DESC, t1.song_id): This window function ranks the songs within each month.PARTITION BY t1.month: This means the ranking starts over for each month.ORDER BY t1.play_pv DESC, t1.song_id: It ranks the songs first by the number of plays (play_pv) in descending order. If two songs have the same play count, the song_id is used to break the tie (this ensures a deterministic ranking order).JOIN song_info s ON t1.song_id = s.song_id: Joins the t1 result set with the song_info table to retrieve the song names.WHERE s.singer_name = '周杰伦': Filters to only include songs by Jay Chou (周杰伦).

Example Output of t2:For each month and each song, this will include the song's play count (play_pv), its name, and the rank:

1

1

明明就

4

1

2

说好的幸福呢

4

1

3

大笨钟

2

2

1

明明就

2

2

2

说好的幸福呢

1

2

3

大笨钟

1

3. Final SELECT Query

SELECT
    month,
    ranking,
    song_name,
    play_pv
FROM t2
WHERE ranking IN (1, 2, 3)

Purpose:

  • This final step retrieves the top 3 songs for each month based on their ranking.
  • WHERE ranking IN (1, 2, 3): Filters the results to only include the top 3 ranked songs for each month.

Example Final Output:

1

1

明明就

4

1

2

说好的幸福呢

4

1

3

大笨钟

2

2

1

明明就

2

2

2

说好的幸福呢

1

2

3

大笨钟

1

Explanation of Key Concepts Used:

  1. ROW_NUMBER() Window Function:This assigns a unique ranking to each song within each month, based on its play count (play_pv), in descending order. If two songs have the same number of plays, the song_id is used to break the tie.
  2. PARTITION BY:In the ROW_NUMBER() function, the PARTITION BY clause ensures that the ranking starts anew for each month. Without it, the ranking would continue across all months.
  3. JOIN:The JOIN operations are used to combine data from the play_log, user_info, and song_info tables. This is necessary to:Filter for users aged 18-25.Ensure that we only look at songs by the artist "周杰伦".Retrieve song names.
  4. GROUP BY:In the t1 CTE, the GROUP BY clause is used to group the play logs by month and song_id. This allows the counting of plays for each song in each month.
  5. WHERE Clause:The final WHERE clause (WHERE ranking IN (1, 2, 3)) ensures that only the top 3 songs by play count are returned for each month.