题解 | #小红书推荐系统#

题目链接

小红书推荐系统

题目描述

根据小红的一份搜索记录（一个由小写字母和空格组成的字符串），找出所有的“关键词”。一个单词被定义为“关键词”，当且仅当它在搜索记录中出现的次数不少于3次。

输出要求：

输出所有关键词，每行一个。
关键词需要按照出现频次从高到低排序。
如果频次相同，则按照字典序升序排序。

解题思路

本题的核心任务是进行词频统计，并根据多重条件对结果进行排序。整个过程可以分解为以下几个步骤：

词频统计：
- 首先，我们需要将输入的整个字符串分割成一个个独立的单词。
- 然后，统计每个单词出现的次数。使用哈希表（在C++中是 map，Java中是 HashMap，Python中是 dict）是实现这一步最有效的数据结构。表的键（key）是单词，值（value）是该单词出现的次数。
- 我们遍历分割后的所有单词，对于每个单词，在哈希表中更新其计数值。
筛选关键词：
- 遍历词频统计完成后的哈希表。
- 将所有出现次数大于或等于3次的单词（即关键词）及其频次提取出来，存入一个新的列表或数组中。这个列表中的每个元素都应包含单词和它的频次信息，方便后续排序。
自定义排序：
- 这是本题的重点。我们需要对筛选出的关键词列表进行自定义排序。
- 主排序键：频次，按降序排列。
- 次排序键：单词本身（字典序），按升序排列。
- 在实现排序比较逻辑时，应先比较两个关键词的频次。如果频次不同，频次高的排在前面。如果频次相同，再比较它们的字典序，字典序小的排在前面。
输出结果：
- 遍历排序完成的关键词列表，并逐行输出每个关键词。

代码

cpp
java
python

#include <iostream>
#include <string>
#include <vector>
#include <map>
#include <algorithm>
#include <sstream>

using namespace std;

struct Keyword {
    string word;
    int count;
};

bool compareKeywords(const Keyword& a, const Keyword& b) {
    if (a.count != b.count) {
        return a.count > b.count;
    }
    return a.word < b.word;
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(NULL);

    string line;
    getline(cin, line);

    stringstream ss(line);
    string word;
    map<string, int> counts;
    while (ss >> word) {
        counts[word]++;
    }

    vector<Keyword> keywords;
    for (auto const& [key, val] : counts) {
        if (val >= 3) {
            keywords.push_back({key, val});
        }
    }

    sort(keywords.begin(), keywords.end(), compareKeywords);

    for (const auto& kw : keywords) {
        cout << kw.word << endl;
    }

    return 0;
}

import java.util.Scanner;
import java.util.Map;
import java.util.HashMap;
import java.util.List;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;

class Keyword {
    String word;
    int count;

    Keyword(String word, int count) {
        this.word = word;
        this.count = count;
    }
}

public class Main {
    public static void main(String[] args) {
        Scanner sc = new Scanner(System.in);
        Map<String, Integer> counts = new HashMap<>();
        
        // 读取整行并按空格分割
        if (sc.hasNextLine()) {
            String[] words = sc.nextLine().split("\\s+");
            for (String word : words) {
                if (!word.isEmpty()) {
                    counts.put(word, counts.getOrDefault(word, 0) + 1);
                }
            }
        }

        List<Keyword> keywords = new ArrayList<>();
        for (Map.Entry<String, Integer> entry : counts.entrySet()) {
            if (entry.getValue() >= 3) {
                keywords.add(new Keyword(entry.getKey(), entry.getValue()));
            }
        }

        Collections.sort(keywords, new Comparator<Keyword>() {
            @Override
            public int compare(Keyword a, Keyword b) {
                if (a.count != b.count) {
                    return Integer.compare(b.count, a.count); // 降序
                }
                return a.word.compareTo(b.word); // 升序
            }
        });

        for (Keyword kw : keywords) {
            System.out.println(kw.word);
        }
    }
}

import sys
from collections import Counter

# 读取整行输入
line = sys.stdin.readline().strip()

# 按空格分割单词并进行词频统计
words = line.split()
counts = Counter(words)

# 筛选出现次数不少于3次的关键词
keywords = []
for word, count in counts.items():
    if count >= 3:
        keywords.append((word, count))

# 排序：主键为频次（降序），次键为字典序（升序）
# -x[1] 表示按频次降序，x[0] 表示按字典序升序
keywords.sort(key=lambda x: (-x[1], x[0]))

# 输出结果
for word, count in keywords:
    print(word)

算法及复杂度

算法：哈希表 + 排序
时间复杂度：设输入字符串的总长为 $L$ ，单词总数为 $N$ ，独立单词数为 $K$ 。
- 词频统计： $\mathcal{O}(L)$ 或 $\mathcal{O}(N)$ ，取决于实现。
- 筛选： $\mathcal{O}(K)$ 。
- 排序： $\mathcal{O}(K \log K)$ 。
- 总体时间复杂度为 $\mathcal{O}(L + K \log K)$ 。
空间复杂度：需要一个哈希表存储所有独立单词及其频次，因此空间复杂度为 $\mathcal{O}(K)$ 。