题解|实现自注意力机制

自注意力机制（Self-Attention Mechanism）是一种能够捕获序列内部元素之间关系的机制，它通过计算序列中每个元素与其他所有元素的相关性来实现信息的有效整合。其基本思想是将输入序列映射为查询(Query)、键(Key)和值(Value)三个矩阵，然后通过计算查询和键的相似度得到注意力权重，最后将这些权重与值相乘得到输出。自注意力的计算步骤如下：

计算查询、键和值 $Q = W_Q \cdot X, \quad K = W_K \cdot X, \quad V = W_V \cdot X$ 其中， $X$ 是输入序列， $W_Q$ 、 $W_K$ 和 $W_V$ 是可学习的权重矩阵。
计算注意力分数 $score = \frac{Q \cdot K^T}{\sqrt{d_k}}$
计算注意力权重 $attention = \text{softmax}(score)$ 其中， $\text{softmax}$ 是softmax函数，表达式为 $\text{softmax}(x) = \frac{e^x}{\sum_{i=1}^{n} e^{x_i}}$ 。
计算输出 $output = attention \cdot V$

标准代码如下

def compute_qkv(X, W_q, W_k, W_v):
    Q = np.dot(X, W_q)
    K = np.dot(X, W_k)
    V = np.dot(X, W_v)
    return Q, K, V

def self_attention(Q, K, V):
    d_k = Q.shape[1]
    scores = np.matmul(Q, K.T) / np.sqrt(d_k)
    attention_weights = np.exp(scores) / np.sum(np.exp(scores), axis=1, keepdims=True)
    attention_output = np.matmul(attention_weights, V)
    return attention_output