本文笔记基于Hands-On Machine Learning with Scikit-Learn and TensorFlow一书的第二部分tensorflow2.0

10. Introduction to Artificial Neural Networks with Keras

10.1 the perceptron 感知机

  • when all the neurons in a layer are connected to every neuron in the previous layer(i.e., ites input neurons), the layer is called a fully connected layer,or a dense layer.
  • an example:
  •   a perceptron with two inputs and three outputs is represented here.
    

10.2 Multilayer Perceptron (MLP) 多层感知机

10.2.1 MLP

  • 可解决XOR异或问题
  • input layer --> hidden layers --> output layers
  • 靠近输入层的称为lower layers,靠近输出层的称为upper layers,除去输出层,每层都有一个bias neuron,并且和下一层全连接。
  • an example

    the signal flows only in one direction(from the inputs to the outputs), so this architecture is an example of a feedforward neural network 前馈神经网络
    when an ANN contains a deep stack of hidden layers it is called a **deep neural network(DNN)**深度神经网络

10.2.2 backpropagation algorithm 反向传播算法

this algorithm is so important that it’s worth summarizing it again: for each training instance, the backpropagation algorithm first makes a prediction(forward pass) and measures the error, then goes through each layer in reverse to measure the error contribution from each connection(reverse pass), and finally tweaks the connection weights to reduce the error(Gradient Descent step).
in order for this algorithm to work properly, its authors made a key change to the MLP’s architecture: they replaced the step function with the logistic function(sigmoid function):
σ ( z ) = 1 / ( 1 + e z ) \sigma(z)=1/(1+e^{-z}) σ(z)=1/(1+ez)
besides sigmoid function, the backpropagation algorithm works well with many other activation functions, such as :

  • the hyperbolic tangent function: t a n h ( z ) = 2 σ ( 2 z ) 1 tanh(z)=2\sigma(2z)-1 tanh(z)=2σ(2z)1, 值域 [ 1 , 1 ] [-1,1] [1,1]
  • the Rectified Linear Unit function: R e L U ( z ) = m a x ( 0 , z ) ReLU(z)= max(0,z) ReLU(z)=max(0,z)
  • the softplus function: a smooth variant of ReLU: s o f t p l u s ( z ) = l o g ( 1 + e z ) softplus(z)=log(1+e^z) softplus(z)=log(1+ez), which is close to 0 when z z z is negative, and close to z z z when z z z is positive.

为什么安排activation function ?为了达到非线性的效果;

10.2.3 Regression MLPs 回归多层感知机

10.2.4 Classification MLPs 分类多层感知机

这里可以解决二分类问题,多标签二分类问题,多分类问题,一个普通的网络结构图如下:

10.3 Implementing MLPs with Keras

Keras is a high-level Deep Learning API that allows you to easily build, train, evaluate, and execute all sorts of neural networks.

10.3.1 creating the model using the sequential API

model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28, 28]))
model.add(keras.layers.Dense(300, activation="relu"))  
# activation=keras.activations.relu 相同表达
model.add(keras.layers.Dense(100, activation="relu"))
model.add(keras.layers.Dense(10, activation="softmax"))

# you can also write your model as:
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28,28]),
    keras.layers.Dense(300,activation='relu'),
    keras.layers.Dense(100,activation='relu'),
    keras.layers.Dense(10,activation='softmax')
])
  • Sequential model: the simplest kind of Keras model for neural networks that are just composed of a single stack of layers connected sequentially.
  • Flatten layer:convert each input image into a 1D array; This layer does not have any parameters and it is just there to do some simple preprocessing. As the first layer in the model, you should specify the input_shape, which doesn’t include the batch size, only the shape of the instances. Alternatively, you could add a keras.layers.InputLayer as the first layers, setting input_shape=[28,28] .
  • Dense hidden layer

网络结构可视化:

keras.utils.plot_model(model, "my_fashion_mnist_model.png", show_shapes=True)

有人反应这个可视化运行报错

Failed to import pydot. You must install pydot and graphviz for pydotprint to work.

try: keras可视化报错解决方案

# compile the model 
model.compile(loss="sparse_categorical_crossentropy",
              optimizer="sgd",
              metrics=["accuracy"])

# fit the model
history = model.fit(X_train, y_train, epochs=30,
                    validation_data=(X_valid, y_valid))

we can use many other losses, optimizers, and metrics in this book; for the full lists, see: