4.1 Multiple features(variables)

notation
n n n = number of feature
x ( i ) x^{(i)} x(i) = input (features) of i i ith training example
x j ( i ) x_j^{(i)} xj(i) = value of features j j j in i i ith training example

hypothesis:

previously: h θ ( x ) = θ 0 + θ 1 x h_{\theta}(x) = \theta_0 +\theta_1x hθ(x)=θ0+θ1x
Now: h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 + + θ n x n = θ T x h_{\theta}(x)=\theta_0 + \theta_1x_1 +\theta_2x_2+\cdots+\theta_nx_n=\theta^Tx hθ(x)=θ0+θ1x1+θ2x2++θnxn=θTx
for convenience of notation,define x 0 = 1 x_0=1 x0=1
X = ( <mstyle displaystyle="false" scriptlevel="0"> x 0 </mstyle> <mstyle displaystyle="false" scriptlevel="0"> x 1 </mstyle> <mstyle displaystyle="false" scriptlevel="0"> x 2 </mstyle> <mstyle displaystyle="false" scriptlevel="0"> </mstyle> <mstyle displaystyle="false" scriptlevel="0"> x n </mstyle> ) R n + 1 θ = ( <mstyle displaystyle="false" scriptlevel="0"> θ 0 </mstyle> <mstyle displaystyle="false" scriptlevel="0"> θ 1 </mstyle> <mstyle displaystyle="false" scriptlevel="0"> θ 2 </mstyle> <mstyle displaystyle="false" scriptlevel="0"> </mstyle> <mstyle displaystyle="false" scriptlevel="0"> θ n </mstyle> ) R n + 1 X=\begin{pmatrix} x_0\\x_1\\x_2\\\dots\\x_n\end{pmatrix}\in\mathbb{R^{n+1}}\qquad \theta =\begin{pmatrix}{\theta_0}\\{\theta_1}\\{\theta_2}\\\dots\\{\theta_n}\end{pmatrix}\in\mathbb{R^{n+1}} X=x0x1x2xnRn+1θ=θ0θ1θ2θnRn+1
conclusion:
multivariate linear regression

4.2 Gradient Descent for multiple variables

hypothesis: