简介

解决回归问题,思想简单,实现容易,许多强大的非线性模型的基础,结果具有很好的可解释性,蕴含机器学习中的很多重要思想。

简单线性回归

y ( i ) = a x ( i ) + b y^{(i)}=ax^{(i)}+b y(i)=ax(i)+b
其中 <mover accent="true"> y ^ </mover> ( i ) \hat y^{(i)} y^(i)为预测值

我们希望 y ( i ) y^{(i)} y(i) <mover accent="true"> y ^ </mover> ( i ) \hat y^{(i)} y^(i)的差距尽可能小,考虑所有样本:
<munderover> i = 1 m </munderover> ( y ( i ) <mover accent="true"> y ^ </mover> ( i ) ) 2 \sum\limits_{i=1}^m(y^{(i)} - \hat y^{(i)} )^2 i=1m(y(i)y^(i))2
目标: 找到a和b,使得 i = 1 m ( y ( i ) <mover accent="true"> y ^ </mover> ( i ) ) 2 \sum\limits_{i=1}^m(y^{(i)} - \hat y^{(i)} )^2 i=1m(y(i)y^(i))2尽可能小

典型的最小二乘法问题:最小化误差的平方
a = <munderover> i = 1 m </munderover> ( x ( i ) <mover accent="true"> x ˉ </mover> ) ( y ( i ) <mover accent="true"> y ˉ </mover> ) <munderover> i = 1 m </munderover> ( x ( i ) <mover accent="true"> x ˉ </mover> ) 2 a=\frac{\sum\limits_{i=1}^{m}(x^{(i)}-\bar{x})(y^{(i)}-\bar{y})}{\sum\limits_{i=1}^{m}(x^{(i)}-\bar{x})^2} a=i=1m(x(i)xˉ)2i=1m(x(i)xˉ)(y(i)yˉ)
b = <mover accent="true"> y ˉ </mover> a <mover accent="true"> x ˉ </mover> b=\bar{y} - a\bar{x} b=yˉaxˉ

向量化运算

<munderover> i = 1 m </munderover> w ( i ) v ( i ) w v \sum\limits_{i=1}^{m}w^{(i)}\centerdot v^{(i)} \Longrightarrow w\centerdot v i=1mw(i)v(i)wv
w = ( w ( 1 ) , w ( 2 ) , &ThinSpace; , w ( m ) ) w=(w^{(1)},w^{(2)},\cdots ,w^{(m)}) w=(w(1),w(2),,w(m))
v = ( v ( 1 ) , v ( 2 ) , &ThinSpace; , v ( m ) ) v=(v^{(1)},v^{(2)},\cdots ,v^{(m)}) v=(v(1),v(2),,v(m))

回归算法的评价

均方误差MSE(Mean Squared Error)

M S E = 1 m <munderover> i = 1 m </munderover> ( y t e s t ( i ) <mover accent="true"> y ^ </mover> t e s t ( i ) ) 2 MSE =\frac{1}{m}\sum\limits_{i=1}^m(y_{test}^{(i)} -\hat y_{test}^{(i)})^2 MSE=m1i=1m(ytest(i)y^test(i))2

均方根误差RMSE(Root Mean Squared Error)

R M S E = 1 m <munderover> i = 1 m </munderover> ( y t e s t ( i ) <mover accent="true"> y ^ </mover> t e s t ( i ) ) 2 RMSE = \sqrt{\frac{1}{m}\sum\limits_{i=1}^m(y_{test}^{(i)} -\hat y_{test}^{(i)})^2} RMSE=m1i=1m(ytest(i)y^test(i))2

平均绝对误差MAE(Mean Absolute Error)

M A E = 1 m <munderover> i = 1 m </munderover> y t e s t ( i ) <mover accent="true"> y ^ </mover> t e s t ( i ) MAE = \frac{1}{m}\sum\limits_{i=1}^m|y_{test}^{(i)} -\hat y_{test}^{(i)}| MAE=m1i=1mytest(i)y^test(i)

from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

msek = mean_squared_error(y_test,y_predict)
maek = mean_absolute_error(y_test,y_predict)

R Square

最好的衡量线性回归法的指标
R 2 = 1 <munderover> i = 1 m </munderover> ( <mover accent="true"> y ^ </mover> ( i ) y ( i ) ) 2 <munderover> i = 1 m </munderover> ( <mover accent="true"> y ˉ </mover> y ( i ) ) 2 = 1 <munderover> i = 1 m </munderover> ( <mover accent="true"> y ^ </mover> ( i ) y ( i ) ) 2 / m <munderover> i = 1 m </munderover> ( <mover accent="true"> y ˉ </mover> y ( i ) ) 2 / m = 1 M S E ( <mover accent="true"> y ^ </mover> , y ) V a r ( y ) R^2 = 1 - \frac{\sum\limits_{i=1}^m(\hat y^{(i)} - y^{(i)})^2}{\sum\limits_{i=1}^m(\bar y - y^{(i)})^2}= 1 - \frac{\sum\limits_{i=1}^m(\hat y^{(i)} - y^{(i)})^2/m}{\sum\limits_{i=1}^m(\bar y - y^{(i)})^2/m}= 1-\frac{MSE(\hat y,y)}{Var(y)} R2=1i=1m(yˉy(i))2i=1m(y^(i)y(i))2=1i=1m(yˉy(i))2/mi=1m(y^(i)y(i))2/m=1Var(y)MSE(y^,y)
其中分子是使用我们的模型预测产生的错误,分母是使用 y = <mover accent="true"> y ˉ </mover> y=\bar y y=yˉ预测产生的错误
R 2 1 R^2 \le 1 R21
R 2 R^2 R2越大越好,当我们的预测模型不犯任何错误时, R 2 R^2 R2得到最大值1
当我们的模型等于基准模型时, R 2 R^2 R2为0
如果 R 2 &lt; 0 R^2 &lt; 0 R2<0,说明我们学习到的模型还不如基准模型。此时,很有可能我们的数据不存在任何线性关系。

from sklearn.metrics import r2_score
print(r2_score(y_test,y_predict))

多元线性回归

x ( i ) = ( X 1 ( i ) , X 2 ( i ) , &ThinSpace; , X n ( i ) ) x^{(i)}=(X_1^{(i)},X_2^{(i)},\cdots,X_n^{(i)}) x(i)=(X1(i),X2(i),,Xn(i))

y = θ 0 + θ 1 x 1 + θ 2 x 2 + + θ n x n y=\theta_0+\theta_1x_1+\theta_2x_2+\cdots+\theta_nx_n y=θ0+θ1x1+θ2x2++θnxn

<mover accent="true"> y ^ </mover> ( i ) = θ 0 + θ 1 X 1 ( i ) + θ 2 X 2 ( i ) + + θ n X n ( i ) \hat y^{(i)}=\theta_0+\theta_1X_1^{(i)}+\theta_2X_2^{(i)}+\cdots+\theta_nX_n^{(i)} y^(i)=θ0+θ1X1(i)+θ2X2(i)++θnXn(i),其中 X 0 ( i ) = 1 X_0^{(i)}=1 X0(i)=1

θ = ( θ 0 , θ 1 , θ 2 , &ThinSpace; , θ n ) T \theta = (\theta_0,\theta_1,\theta_2,\cdots,\theta_n)^T θ=(θ0,θ1,θ2,,θn)T

目标:使 i = 1 m ( y ( i ) <mover accent="true"> y ^ </mover> ( i ) ) 2 \sum\limits_{i=1}^m(y^{(i)} - \hat y^{(i)} )^2 i=1m(y(i)y^(i))2尽可能小

X ( i ) = ( X 0 ( i ) , X 1 ( i ) , X 2 ( i ) , &ThinSpace; , X n ( i ) ) X^{(i)} =(X_0^{(i)},X_1^{(i)},X_2^{(i)},\cdots,X_n^{(i)}) X(i)=(X0(i),X1(i),X2(i),,Xn(i))
X b = ( <mstyle displaystyle="false" scriptlevel="0"> 1 </mstyle> <mstyle displaystyle="false" scriptlevel="0"> X 1 ( 1 ) </mstyle> <mstyle displaystyle="false" scriptlevel="0"> X 2 ( 1 ) </mstyle> <mstyle displaystyle="false" scriptlevel="0"> </mstyle> <mstyle displaystyle="false" scriptlevel="0"> X n ( 1 ) </mstyle> <mstyle displaystyle="false" scriptlevel="0"> 1 </mstyle> <mstyle displaystyle="false" scriptlevel="0"> X 1 ( 2 ) </mstyle> <mstyle displaystyle="false" scriptlevel="0"> X 2 ( 2 ) </mstyle> <mstyle displaystyle="false" scriptlevel="0"> </mstyle> <mstyle displaystyle="false" scriptlevel="0"> X n ( 2 ) </mstyle> <mstyle displaystyle="false" scriptlevel="0"> </mstyle> <mstyle displaystyle="false" scriptlevel="0"> </mstyle> <mstyle displaystyle="false" scriptlevel="0"> </mstyle> <mstyle displaystyle="false" scriptlevel="0"> </mstyle> <mstyle displaystyle="false" scriptlevel="0"> </mstyle> <mstyle displaystyle="false" scriptlevel="0"> 1 </mstyle> <mstyle displaystyle="false" scriptlevel="0"> X 1 ( m ) </mstyle> <mstyle displaystyle="false" scriptlevel="0"> X 2 ( m ) </mstyle> <mstyle displaystyle="false" scriptlevel="0"> </mstyle> <mstyle displaystyle="false" scriptlevel="0"> X n ( m ) </mstyle> ) X_b=\begin{pmatrix} 1&amp; {X_1^{(1)}}&amp;{X_2^{(1)}}&amp;{\dots}&amp;{X_n^{(1)}} \\ 1&amp; {X_1^{(2)}}&amp;{X_2^{(2)}}&amp;{\dots}&amp;{X_n^{(2)}} \\ {\cdots}&amp;{}&amp;{}&amp;{}&amp;{\cdots}\\ 1&amp; {X_1^{(m)}}&amp;{X_2^{(m)}}&amp;{\dots}&amp;{X_n^{(m)}} \\ \end{pmatrix} Xb=111X1(1)X1(2)X1(m)X2(1)X2(2)X2(m)Xn(1)Xn(2)Xn(m)
θ = ( <mstyle displaystyle="false" scriptlevel="0"> θ 0 </mstyle> <mstyle displaystyle="false" scriptlevel="0"> θ 1 </mstyle> <mstyle displaystyle="false" scriptlevel="0"> θ 2 </mstyle> <mstyle displaystyle="false" scriptlevel="0"> </mstyle> <mstyle displaystyle="false" scriptlevel="0"> θ n </mstyle> ) \theta=\begin{pmatrix}\theta_0\\ \theta_1\\\theta_2\\\cdots\\\theta_n\end{pmatrix} θ=θ0θ1θ2θn
<mover accent="true"> y ^ </mover> = X b θ \hat y = X_b \centerdot \theta y^=Xbθ

多元线性回归的正规方程解(Normal Equation)

θ = ( X b T X b ) 1 X b T y \theta =(X_b^TX_b)^{-1}X_b^Ty θ=(XbTXb)1XbTy
时间复杂度高: O ( n 3 ) O(n^3) O(n3)优化过后 O ( n 2.4 ) O(n^{2.4}) O(n2.4)
优点:不需要对数据做归一化处理

sklearn中的回归问题

from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X_train,y_train)
print(lin_reg.coef_)
print(lin_reg.intercept_)
from sklearn.neighbors import KNeighborsRegressor
knn_reg =KNeighborsRegressor()
knn_reg.fit(X_train,y_train)
aa = knn_reg.score(X_test,y_test)
print(aa)

线性回归算法总结

1.典型的参数学习,而KNN是非参数学习
2.只能解决回归问题,虽然很多分类方法中,线性回归是基础。而KNN既可以解决分类问题,也可以解决回归问题。