1. linear model complexity



Logistic Model is defined as: X*W + b = y

parameter W and b should be determined by optimization method. 

X is 1 by 784. 784 = 28*28

W is 784 by 10

b is 1 by 10

so number of parameters is 784*10 + 10

2. Rectified Linear Unit (ReLU)  and neutron networks

another activation function more like brain activation signal than sigmoid.


picture below shows a two layers neutron networks.



1.The first layer effectively consists of the set of weights and biases applied to X and passed through ReLUs. The output of this layer is fed to the next one, but is not observable outside the network, hence it is known as a hidden layer.
2.The second layer consists of the weights and biases applied to these intermediate outputs, followed by the softmax function to generate probabilities.


3. chain rule

chain rule is a concept in calculus and demonstrates the derivative of a function with a function as its input parameters.



 it has efficient data pipeline and lots of data reuse.

4.back propagation


forward propagation computes output y

back propagation computes all derivatives of weight matrices.

then we can update weight by new_weight = weight - alpha*derivative_weight.

back propagation need two times memory and computation than forward propagation.



5. Deep learning networks


实战(2)中我们实现了一个只有一个隐藏层的神经网络。

其与下图类似。



当然我们可以实现更加深层或更加广度的神经网络。


6.Early termination

在validation data 的准确度达到一定峰值时,要及时结束训练,来避免过拟合。


7.  Regularization 

将权重向量的2范数引入到loss中,作为惩罚项。


8. Drop out

多层神经网络中,一层的输出可一作为下一层的输入。

drop-out的意思是在上一层输出的节点中随机将选取的一半或其他一部分节点丢弃,并将剩下的节点作为下一层的输入。


当drop-out不起作用时,大概我们需要一个更大的神经网络的了。

使用drop-out有一些小技巧。

(1)在训练时,进行drop-out,并将结果放大两倍

(2)在评估时,不进行drop-out。