线性代数之——最小二乘

1. 最小二乘

$A x = b$ 经常会没有解，当方程个数大于未知数个数，也即 $m > n$ 时，列空间并不是 $R^{m}$ 空间的全部，因此 $b$ 可能不在列空间中，这时候方程组就无解，但我们不应该就此而停止。

也就是误差 $e = b - A x$ 并不总是能得到 0，这时候，如果误差 $e$ 的长度尽可能的小，那我们就得到了最小二乘解 $<mover accent="true"> x^</mover>$ 。

当 $A x = b$ 无解的时候，我们乘以 $A^{T}$ 来求解 $A^{T} A x = A^{T} b$ 。

假如我们要找到一条直线，让它距离 (0, 6) ，(1, 0)，(2, 0) 这三点最近。没有直线 $b = C + D t$ 同时穿过这三点，我们要找的两个常数 $C$ 和 $D$ 。

$\begin{matrix} <mstyle displaystyle="true" scriptlevel="0"> </mstyle> & <mstyle displaystyle="true" scriptlevel="0"> C + </mstyle> & <mstyle displaystyle="true" scriptlevel="0"> D </mstyle> & <mstyle displaystyle="true" scriptlevel="0"> \cdot 0 = 6 </mstyle> \\ <mstyle displaystyle="true" scriptlevel="0"> </mstyle> & <mstyle displaystyle="true" scriptlevel="0"> C + </mstyle> & <mstyle displaystyle="true" scriptlevel="0"> D </mstyle> & <mstyle displaystyle="true" scriptlevel="0"> \cdot 1 = 0 </mstyle> \\ <mstyle displaystyle="true" scriptlevel="0"> </mstyle> & <mstyle displaystyle="true" scriptlevel="0"> C + </mstyle> & <mstyle displaystyle="true" scriptlevel="0"> D </mstyle> & <mstyle displaystyle="true" scriptlevel="0"> \cdot 2 = 0 </mstyle> \end{matrix}$

$A = [\begin{matrix} <mstyle displaystyle="false" scriptlevel="0"> 1 </mstyle> & <mstyle displaystyle="false" scriptlevel="0"> 0 </mstyle> \\ <mstyle displaystyle="false" scriptlevel="0"> 1 </mstyle> & <mstyle displaystyle="false" scriptlevel="0"> 1 </mstyle> \\ <mstyle displaystyle="false" scriptlevel="0"> 1 </mstyle> & <mstyle displaystyle="false" scriptlevel="0"> 2 </mstyle> \end{matrix}] x = [\begin{matrix} <mstyle displaystyle="false" scriptlevel="0"> C </mstyle> \\ <mstyle displaystyle="false" scriptlevel="0"> D </mstyle> \end{matrix}] b = [\begin{matrix} <mstyle displaystyle="false" scriptlevel="0"> 6 </mstyle> \\ <mstyle displaystyle="false" scriptlevel="0"> 0 </mstyle> \\ <mstyle displaystyle="false" scriptlevel="0"> 0 </mstyle> \end{matrix}]$

由于 $b = (6, 0, 0)$ 不是 $A$ 的列的一个线性组合，因此方程组无解。

$A^{T} A x = A^{T} b \to [\begin{matrix} <mstyle displaystyle="false" scriptlevel="0"> 3 </mstyle> & <mstyle displaystyle="false" scriptlevel="0"> 3 </mstyle> \\ <mstyle displaystyle="false" scriptlevel="0"> 3 </mstyle> & <mstyle displaystyle="false" scriptlevel="0"> 5 </mstyle> \end{matrix}] [\begin{matrix} <mstyle displaystyle="false" scriptlevel="0"> C </mstyle> \\ <mstyle displaystyle="false" scriptlevel="0"> D </mstyle> \end{matrix}] [\begin{matrix} <mstyle displaystyle="false" scriptlevel="0"> 6 </mstyle> \\ <mstyle displaystyle="false" scriptlevel="0"> 0 </mstyle> \end{matrix}]$

$\begin{matrix} <mstyle displaystyle="true" scriptlevel="0"> C = 5 </mstyle> \\ <mstyle displaystyle="true" scriptlevel="0"> D = - 3 </mstyle> \end{matrix}$

因此，距离这三点最近的一条直线为 $b = 5 - 3 t$ 。

2. 最小化误差

几何理解

任何 $A x$ 都是 $A$ 的列的一个线性组合，它们都位于以 $A$ 的列为基的一个平面中。因此，我们要找的就是平面中的一个距离 $b$ 最近的向量，而这个向量就是 $b$ 在这个平面中的投影 $p$ 。

代数理解

$A x = b = p + e$ 是不可解的，但 $A <mover accent="true"> x^</mover> = p$ 是可解的。我们需要最小化下面这个误差

$∣ ∣ A x - b ∣ ∣^{2} = ∣ ∣ A x - p ∣ ∣^{2} + ∣ ∣ e ∣ ∣^{2}$

当取 $x = <mover accent="true"> x^</mover>$ ，$ ||Ax-p||^2 = 0$，因此最小误差为 $∣ e ∣ ∣^{2}$ 。

微积分理解

误差函数可以表示为

两个未知数有两个导数，当导数分别为零时，我们就得到了误差函数的最小值。

整理后我们得到

可以看到，这和 $A^{T} A x = A^{T} b$ 得到的结果是一样的。也就是说当 $A^{T} A x = A^{T} b$ 的时候 $∣ ∣ A x - b ∣ ∣^{2}$ 的偏导数为零。

在四个基本子空间中，这次我们将 $b$ 分解为 $b = p + e$ ，这时候 $A^{T} A x = A^{T} b$ 的零空间解只有零向量，因此最优解只有一个 $A <mover accent="true"> x^</mover> = p$ 。

获取更多精彩，请关注「seniusen」!