小红书推荐系统

思路

拿到这道题，先读题：给你一行搜索记录（由空格分隔的单词），要你找出所有出现次数不少于 3 次的单词，按频次从高到低输出，频次相同则按字典序升序。

那怎么统计每个单词出现了多少次呢？最自然的想法就是用一个哈希表（map），以单词为 key、出现次数为 value。

具体步骤是什么？

读入一整行字符串，按空格拆分出每个单词
用哈希表统计每个单词的出现次数
筛出频次 $\geq 3$ 的单词
按频次降序排序，频次相同按字典序升序
逐行输出

整个流程就是：统计 -> 筛选 -> 排序 -> 输出，非常直白。

有什么需要注意的？

排序的比较函数要写对——先比频次（大的在前），频次一样再比字典序（小的在前）。另外输入可能有连续空格，拆分时注意跳过空串就行（C++ 的 istringstream 天然处理了这个问题）。

代码

C++
Java
Python3
JavaScript

#include <bits/stdc++.h>
using namespace std;

int main() {
    string line;
    getline(cin, line);
    istringstream iss(line);
    string word;
    map<string, int> cnt;
    while (iss >> word) {
        cnt[word]++;
    }
    vector<pair<string, int>> v;
    for (auto& [w, c] : cnt) {
        if (c >= 3) v.push_back({w, c});
    }
    sort(v.begin(), v.end(), [](const pair<string,int>& a, const pair<string,int>& b) {
        if (a.second != b.second) return a.second > b.second;
        return a.first < b.first;
    });
    for (auto& [w, c] : v) {
        cout << w << "\n";
    }
    return 0;
}

import java.util.*;

public class Main {
    public static void main(String[] args) {
        Scanner sc = new Scanner(System.in);
        String line = sc.nextLine();
        String[] words = line.split(" ");
        Map<String, Integer> cnt = new HashMap<>();
        for (String w : words) {
            if (!w.isEmpty()) {
                cnt.put(w, cnt.getOrDefault(w, 0) + 1);
            }
        }
        List<Map.Entry<String, Integer>> list = new ArrayList<>();
        for (Map.Entry<String, Integer> e : cnt.entrySet()) {
            if (e.getValue() >= 3) list.add(e);
        }
        list.sort((a, b) -> {
            if (!a.getValue().equals(b.getValue())) return b.getValue() - a.getValue();
            return a.getKey().compareTo(b.getKey());
        });
        StringBuilder sb = new StringBuilder();
        for (Map.Entry<String, Integer> e : list) {
            sb.append(e.getKey()).append("\n");
        }
        System.out.print(sb);
    }
}

from collections import Counter

line = input()
words = line.split()
cnt = Counter(words)

result = [(w, c) for w, c in cnt.items() if c >= 3]
result.sort(key=lambda x: (-x[1], x[0]))

for w, c in result:
    print(w)

const readline = require('readline');
const rl = readline.createInterface({ input: process.stdin });
const lines = [];
rl.on('line', line => lines.push(line));
rl.on('close', () => {
    const words = lines[0].split(' ').filter(w => w.length > 0);
    const cnt = {};
    for (const w of words) {
        cnt[w] = (cnt[w] || 0) + 1;
    }
    const result = [];
    for (const [w, c] of Object.entries(cnt)) {
        if (c >= 3) result.push([w, c]);
    }
    result.sort((a, b) => {
        if (a[1] !== b[1]) return b[1] - a[1];
        return a[0] < b[0] ? -1 : a[0] > b[0] ? 1 : 0;
    });
    const out = [];
    for (const [w, c] of result) {
        out.push(w);
    }
    console.log(out.join('\n'));
});

复杂度分析

时间复杂度: $\mathcal{O}(n + m \log m)$ ，其中 $n$ 是字符串长度， $m$ 是不同单词数。遍历字符串统计 $\mathcal{O}(n)$ ，排序 $\mathcal{O}(m \log m)$
空间复杂度: $\mathcal{O}(n)$ ，哈希表存储所有单词及其计数

总结

这题考的就是字符串处理和排序的基本功。用哈希表统计词频，筛选出高频词，再按自定义规则排序输出。没有什么弯弯绕绕，把题意转化成代码的过程很顺畅。唯一要注意的就是排序的双关键字——频次降序 + 字典序升序，别写反了就行。

题解 | #小红书推荐系统#

小红书推荐系统

思路

具体步骤是什么？

有什么需要注意的？

代码

复杂度分析

总结