题解 | #哈希冲突#

题目链接

哈希冲突

题目描述

在密码学中，哈希碰撞是指找到两个不同的输入，使它们经过同一个哈希函数处理后得到相同的输出。

题目提供了一个哈希函数 H(s)，其过程如下：

接收输入字符串 s。
与一个全局的密钥 enc_key 右拼接，得到 enc_key + s。
对拼接后的字符串计算 SHA256 哈希值。
取哈希值（十六进制表示）的前 enc_len 个字符作为输出。

你的任务是：找到两个不同的、均由小写字母组成、长度均为 enc_len 的字符串 s1 和 s2，使它们满足 H(s1) == H(s2)。

这是一个 核心代码模式 的题目，你只需要补全 solve 函数。

输入描述:

enc_len: 待构造字符串的长度 ( $1 \le enc_len \le 8$ )。
enc_key: 拼接用的密钥字符串。

输出描述:

返回一个包含两个碰撞字符串 s1 和 s2 的值对。

示例: 输入:

2
a

输出:

mq ap

(注意：由于算法的随机性，你每次运行得到的输出可能会不同，但只要满足碰撞条件即可)

解题思路

这道题是 "生日问题" 在哈希领域的一个经典应用，通常被称为 "生日攻击"。

生日问题：一个屋子里需要有多少人，才能使得有两个人生日相同的概率达到50%？答案不是183，而是惊人的23人。 核心思想：随着我们测试的输入数量增加，新的输出与所有旧的输出发生碰撞的概率会迅速增长。

对于本题：

哈希函数的输出是一个长度为 enc_len 的十六进制字符串。
可能的输出总数是 $16^{enc\_len}$ 。
根据生日攻击的数学原理，我们大约只需要尝试 $k \approx \sqrt{16^{enc\_len}} = 4^{enc\_len}$ 个不同的输入，就有很高的概率找到一次碰撞。

当 enc_len 最大为 8 时，我们需要尝试的次数大约是 $4^8 = 65536$ 次。这个计算量对于现代计算机来说是微不足道的。

因此，我们可以采用一种随机的暴力搜索策略：

算法步骤:

创建一个哈希表 seen_hashes，用于存储已经计算出的哈希值及其对应的原始输入字符串。Map<String, String>: hash -> original_string。
进入一个无限循环，因为我们确信在可接受的时间内能找到碰撞。
在循环中，随机生成一个由小写字母组成、长度为 enc_len 的字符串 s。
计算其哈希值 h = H(s)。
检查 h 是否已存在于 seen_hashes 中：
- 如果存在: 我们找到了碰撞！设 s1 = seen_hashes.get(h)，当前字符串为 s2 = s。只要确保 s1 和 s2 不完全相同（随机生成时几乎不可能相同），我们就找到了一个有效的碰撞对，可以返回 (s1, s2)。
- 如果不存在: 这是我们第一次遇到这个哈希值。将其存入哈希表中：seen_hashes.put(h, s)。
继续循环，直到找到碰撞。

代码

cpp
java
python

pair<string, string> solve() {
    unordered_map<string, string> seen_hashes;
    
    random_device rd;
    mt19937 gen(rd());
    uniform_int_distribution<> distrib(0, 25);

    while (true) {
        string s(enc_len, ' ');
        for (int i = 0; i < enc_len; ++i) {
            s[i] = 'a' + distrib(gen);
        }

        string h = H(s);

        if (seen_hashes.count(h)) {
            string s1 = seen_hashes[h];
            if (s1 != s) {
                // 返回字典序较小的在前
                return (s1 < s) ? make_pair(s1, s) : make_pair(s, s1);
            }
        } else {
            seen_hashes[h] = s;
        }
    }
}

static Pair<String, String> solve() {
    Map<String, String> seenHashes = new HashMap<>();
    Random rand = new Random();
    
    while (true) {
        StringBuilder sb = new StringBuilder(encLen);
        for (int i = 0; i < encLen; i++) {
            sb.append((char)('a' + rand.nextInt(26)));
        }
        String s = sb.toString();
        String h = H(s);

        if (seenHashes.containsKey(h)) {
            String s1 = seenHashes.get(h);
            if (!s1.equals(s)) {
                // 返回字典序较小的在前，保持输出统一
                if (s1.compareTo(s) < 0) {
                    return new Pair<>(s1, s);
                } else {
                    return new Pair<>(s, s1);
                }
            }
        } else {
            seenHashes.put(h, s);
        }
    }
}

import random
import string
def solve():
    seen_hashes = {}
    chars = string.ascii_lowercase
    while True:
        s = "".join(random.choice(chars) for _ in range(enc_len))
        h = H(s)
        if h in seen_hashes:
            s1 = seen_hashes[h]
            if s1 != s:
                # 返回字典序较小的在前
                return (s1, s) if s1 < s else (s, s1)
        else:
            seen_hashes[h] = s

算法及复杂度

算法: 生日攻击 / 哈希表
时间复杂度: 平均情况下为 $\mathcal{O}(4^{enc\_len})$ 。由于 enc_len 最大为 8，这是一个非常小的常数时间。
空间复杂度: 平均情况下为 $\mathcal{O}(4^{enc\_len})$ 。用于存储见过的哈希值。