论文题目

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks(DCGAN下的无监督表征学习)
我认为读paper无论别人讲得多好，也一定要去读论文原文，或许差距就在这个地方拉开

摘要

近年以来，机器学习中已经广泛应用CNN来做监督学习。相比之下，CNN的无监督学习却并未获得太多注意，在里，我们希望可以成功建立CNN的监督和无监督学习的梯子。我们发明了叫DCGAN的结构，并为它确定了一系列的架构约束，并揭示了它们是无监督学习下颇具竞争力的候选方案。通过在不同训练集上训练，我们相信，无论是判别器还是生成器，无论是单个还是图像全局景象DCGAN都能学习到一系列特征。除此之外，我们使用这些学习到的特征完成了一些神奇的应用—揭示了它们在普遍意义上能做出的图像表征。

1.介绍

从大规模无标记数据集中学习到可以重复使用的特征，这已经是一个活跃的研究了。在计算机视觉环境下，如果可以从大批量无标记的图像和视频中学习到良好的中间特征，就可以将它用于诸如图像分类的监督学习任务。我们提出，训练GAN是一种建立图像良好的特征的一种方法，之后我们会把判别器和生成器都作为可以再用的特征提取器，用到监督学习任务中。GAN实际上是为最大似然估计的相关技术提供了一种颇具吸引力的代替方案。One can additionally argue that their learning process and the lack of a heuristic cost function (such as pixel-wise independent mean-square error) are attractive to representation learning. 但是GAN也以训练不稳定而闻名，这也导致了生成器会产生很多荒谬的结果。对于GAN中到底学习到了什么样的中间表征，在当前的研究并不多见。在这篇paper中，我们做了如下贡献：

我们提出：什么样的GAN体系结构可以使得GAN训练更稳定，叫DCGAN
我们使用图像分类任务上训练出来的判别器和其他无监督学习算法进行了比较
我们对GAN学习到的特征进行了可视化，并经验性的证明了特殊的特征表征了特殊的对象
针对生成器，我们提出了一个很有趣的算法向量，这个向量能很简单的在语义层面上操作生成样例的质量

2.相关工作

2.1 无监督表征学习

公正的说，无监督表征学习在计算机视觉中的研究已经做的很好了，例如对图像上下文的表征。一个经典的无监督学习的手段是做聚类(k-means)，之后利用聚类的数据改善分类结果。在图像这一类场景下，可以对图像进行批处理，利用多个图像的聚类来学习到更有效的图像表征，另外一个很酷的研究是训练自编码器，Another popular method is to train auto-encoders (convolutionally, stacked (Vincent et al., 2010), separating the what and where components of the code (Zhao et al., 2015), ladder structures (Rasmus et al., 2015)) that encode an image into a compact code, and decode the code to reconstruct the image as accurately as possible.主流存在两种方式：一种是分类编码中向量的意义和位置，另外一种是利用编码的梯度结构，这两种方式都能将图像紧凑的编码，并且尽可能的通过解码器还原图像。这些方法已经被证明可以图像像素来学习表征，深度置信网络同样也能学习到特征的连续表达。

2.2 生成自然图像

图像的生成模型已经充分研究过可以划分为2个区域：参数化领域和非参数化领域。非参数化领域通常是在图像数据库中取匹配，经常是对成批的图像进行匹配，它在纹理合成，超分辨率重建和in-paiting中用得较多。参数模型已经被广为探索过，例如Minist中手写数字的纹理合成。虽然生成真实世界的自然图像这一点在当前并没有取得很大成功，但是其中的一些变种已经取得了一定成果，但是这些生成的图像可能会模糊。GAN在图像生成方面，具有不可思议的抗噪声特性。一种添加拉普拉斯金字塔的方法展示出了较高质量的图像，但是由于在链式乘法模型中引入了噪声，导致生成的对象看上去是摇摆的。循环网络和反卷积网络似乎在自然图像的生成上取得了一定成功，但它们并没有应用到监督任务上。

2.3 卷积神经网络可视化

卷积神经网络最大的争议就是黑箱子属性，即使只是用它来模仿很简单的人类行为也一样，我们对网络内部干了什么？在CNN领域，Zeiler 证明了使用反卷积，过滤最大激活，可以逼近网络中每一个卷积滤波器的结果。相似的，如果对输入图像使用梯度下降，我们可以看到滤波器子集上所激活的理想图像。

3 方法和模型架构

$\quad </annotation> </semantics> </math>$

判别器中，使用带步长的卷积层来替换所有pooling层，生成器中使用小步长卷积来代替pooling层。
在生成器和判别器中使用BN。
去除深度架构中的全连接层。
生成器中，除去最后一层使用Tanh之外，每一层都使用ReLU来激活。
判别器中，每一层都使用LeakReLU来激活。

4 训练细节

我们在LSUN,Imagenet-1K和一个新的人脸数据集上训练DCGAN。这3个数据集的使用细节会在接下来指出。
所有的图像，都缩放到Tanh激活函数的定义域[-1,1]内，除此之外没有做任何预处理。所有的模型都使用小批量SGD，一批是128张图。所有的权重都使用正态分布初始化，期望为1，方差为0.02。在LeakReLU中，负向权重全部设置为0.1。有鉴于之前的GAN中使用了动量，我们使用Adam来优化超参数。学习率中发现0.001太大了，使用0.0002。此外，我们发现动量参数β1=0.9的时候，训练波动大也不稳定，所以设置为0.5使训练稳定。

4.1 LSUN

经过一次循环的训练(online learning)也就是一个epoch 和5个epoch，和收敛后得模型得到的效果分别如下：

这表明了DCGAN并不是通过记忆训练数据来生成/过拟合高质量的图片。

4.1.1 重复数据删除

为了降低生成物和所记忆的输入样本的相似程度，我们执行了一个简单的图像去重过程。我们在训练样本中32×32降采样中心切片（就是在训练样本正中间切了32*32出来）上施加了一个3072-128-3072的去噪 dropout+ReLU的自编码器。编码结果层使用ReLU激活阈值进行二值化（这已经被证明是一种有效的信息保存手段），它提供了一个语义hash的简单形式，允许在线性时间内进行去重。这个hash编码可视化结果的错误率不超过1%。此外，该技术检测到并删除了大约275,000个近似重复项，表明召回率很高。

4.2 FACES

我们从网上的图像里面按名字挖了人脸，人名字是从dbpedia中查询出来的，这些人的共同特点是都出生在现代。这个数据集包括1万人，3百万图片，我们在这些图像上用OpenCV跑了一个人脸detection，保证detection具有足够的分辨率，这给了我们35万张脸，我们拿这些脸来训练。图像没有做数据增强。

4.3 IMAGENET-1K

imagenet-1k作为非监督学习下自然图像的来源。我们训练了图像正中间32×32的截图，无数据增强。

5 DCGAN能力的经验确认

5.1 CIFAR-10 上使用GAN作为特征提取器进行分类

$\quad </annotation> </semantics> </math>$

5.2 将GAN作为特征提取器执行SVHN DIGITS分类

$\quad </annotation> </semantics> </math>$

6. 调整和可视化CNN网络的内部

$\quad </annotation> </semantics> </math>$

6.1 漫游隐空间

$\quad </annotation> </semantics> </math>$

6.2 判别器特征可视化

$\quad </annotation> </semantics> </math>$

6.3 MANIPULATING THE G ENERATOR R EPRESENTATION

6.3.1 忘记画某些目标

$\quad </annotation> </semantics> </math>$

6.3.2 VECTOR ARITHMETIC ON FACE SAMPLES

类似于word2vec，图像是不是也有类似的特点，可以在隐空间里进行加减法来得到新的图像？实验表明，使用单张图片的表示并不稳定，使用三张图片会比较稳定。
可以看到，单张图片并不稳定，而三张图片则可以学到表情和墨镜等特征。更甚者，可以学到一个稳定的向量，来进行某种变换，比如，方位变换。

7. 结论

这篇文章看似原理简单，实际上作者做了大量的实验。通过探索隐空间，分析网络，比较特征表现能力等一系列手段来证明DCGAN的强大之处。

################################################################################

使用DCGAN 生成人脸

DCGAN.py

#coding=utf-8

from PIL import Image
from skimage import io

import tensorflow as tf
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import numpy as np
import os
import time

#set parameters
is_training = True
input_dir = "./face/" #数据集

#set hyper parameters
batch_size = 64
image_width = 64
image_height = 64
image_channel = 3
data_shape = [64, 64, 3]
data_length = 64 * 64 * 3

z_dim = 100
learning_rate = 0.00005
beta1 = 0.5
epoch = 5000

#读取数据的函数
def prepare_data(input_dir, floder):
    '''
    函数功能:通过输入图像的路径，读取训练数据
    :param input: 图像所在的根目录,"./face"
    :param floder: 图像数据所在的文件夹,"./face/zxy"
    :return: 返回读取好的训练数据
    '''
    #遍历图像路径,并获取图像数量
    images = os.listdir(input_dir + floder)
    image_len = len(images)
    #设置空的data用于存放数据
    data = np.empty((image_len, image_width, image_height, image_channel), dtype="float32")
    #逐个图像读取
    for i in range(image_len):
        img = Image.open(input_dir + floder + "/" + images[i])
        img = img.resize((image_width, image_height))
        arr = np.asarray(img, dtype="float32")
        data[i, :, :, :] = arr #将其放入data
    sess = tf.Session()
    sess.run(tf.initialize_all_variables())
    data = tf.reshape(data, [-1, image_width, image_height, image_channel])
    train_data = data * 1.0 / 127.5 - 1.0
    train_data = tf.reshape(train_data, [-1, data_length])
    train_set = sess.run(train_data)
    sess.close()
    return train_set

#定义生成器
def Generator(z, is_training, reuse):
    '''
    函数功能：输入噪声z,生成图像gen_img
    :param z:　输入数据,一般为噪声
    :param is_training: 是否为训练环节
    :param reuse: 数据重用
    :return　返回生成影像gen_img
    '''
    #图像的channel维度变化为1->1024->512->256->128->3
    depths = [1024, 512, 256, 128] + [data_shape[2]]
    with tf.variable_scope("Generator", reuse=reuse):
        #第一层全连接层
        with tf.variable_scope("g_fc1", reuse=reuse):
            output = tf.layers.dense(z, depths[0]*4*4, trainable=is_training)
            output = tf.reshape(output, [batch_size, 4, 4, depths[0]])
            output = tf.nn.relu(tf.layers.batch_normalization(output, training=is_training))
        #第二层反卷积层1024
        with tf.variable_scope("g_dc1", reuse=reuse):
            output = tf.layers.conv2d_transpose(output, depths[1], [5, 5], strides=(2, 2), padding='SAME', trainable=is_training)
            output = tf.nn.relu(tf.layers.batch_normalization(output, training=is_training))
        #第三层反卷积层512
        with tf.variable_scope("g_dc2", reuse=reuse):
            output = tf.layers.conv2d_transpose(output, depths[2], [5, 5], strides=(2, 2), padding='SAME', trainable=is_training)
            output = tf.nn.relu(tf.layers.batch_normalization(output, training=is_training))
        #第四层反卷积256
        with tf.variable_scope("g_dc3", reuse=reuse):
            output = tf.layers.conv2d_transpose(output, depths[3], [5, 5], strides=(2, 2), padding='SAME',trainable=is_training)
            output = tf.nn.relu(tf.layers.batch_normalization(output, training=is_training))
        # 第五层反卷积128
        with tf.variable_scope("g_dc4", reuse=reuse):
            output = tf.layers.conv2d_transpose(output, depths[4], [5, 5], strides=(2, 2), padding='SAME', trainable=is_training)
            gen_img = tf.nn.tanh(output)
    return gen_img


def Discriminator(x, is_training, reuse):
    '''
    函数功能:判别输入的图像是真还是假
    :param x: 输入数据
    :param is_training: 是否为训练环节
    :param reuse: 是否训练重用
    :return: 返回结果
    '''
    #生成器的channel变化为:3->64->128->256->512
    depths = [data_shape[2]] + [64, 128, 256, 512]
    with tf.variable_scope("Discriminator", reuse=reuse):
        #第一层卷积层,激活函数用的leaky_relu
        with tf.variable_scope("d_cv1", reuse=reuse):
            output = tf.layers.conv2d(x, depths[1], [5, 5], strides=(2, 2), padding="SAME", trainable=is_training)
            output = tf.nn.leaky_relu(tf.layers.batch_normalization(output, training=is_training))
        with tf.variable_scope("d_cv2", reuse=reuse):
            output = tf.layers.conv2d(output, depths[2], [5, 5], strides=(2, 2), padding='SAME', trainable=is_training)
            output = tf.nn.leaky_relu(tf.layers.batch_normalization(output, training=is_training))
        with tf.variable_scope("d_cv3", reuse=reuse):
            output = tf.layers.conv2d(output, depths[3], [5, 5], strides=(2, 2), padding='SAME', trainable=is_training)
            output = tf.nn.leaky_relu(tf.layers.batch_normalization(output, training=is_training))
        with tf.variable_scope("d_cv4", reuse=reuse):
            output = tf.layers.conv2d(output, depths[4], [5, 5], strides=(2, 2), padding='SAME', trainable=is_training)
            output = tf.nn.leaky_relu(tf.layers.batch_normalization(output, training=is_training))
        #第五层全连接层
        with tf.variable_scope("d_fc1", reuse=reuse):
            output = tf.layers.flatten(output)
            disc_img = tf.layers.dense(output, 1, trainable=is_training)
    return disc_img

def plot_and_save(order, images):
    '''
    函数功能：绘制生成器的结果并保存
    '''
    # 将一个batch_size的所有图像进行保存
    batch_size = len(images)
    n = np.int(np.sqrt(batch_size))
    # 读取图像大小,并生成掩膜canvas
    image_size = np.shape(images)[2]
    n_channel = np.shape(images)[3]
    images = np.reshape(images, [-1, image_size, image_size, n_channel])
    canvas = np.empty((n * image_size, n * image_size, image_channel))
    #为每个掩膜赋值
    for i in range(n):
        for j in range(n):
            canvas[i*image_size:(i+1)*image_size, j*image_size:(j+1)*image_size, :] = images[n*i+j].reshape(64, 64, 3)
    #绘制结果,并设置坐标轴
    plt.figure(figsize=(8, 8))
    plt.imshow(canvas, cmap="gray")
    label = "Epoch: {0}".format(order+1)
    plt.xlabel(label)
    #为每个文件命名
    if type(order) is str:
        file_name = order
    else:
        file_name = "./dst/face_gen" + str(order)
    #保存绘制的结果
    plt.savefig(file_name)
    print(os.getcwd()) #返回当前工作目录
    print("Image saved in file: ", file_name)
    plt.close()

#定义训练过程
def training():
    '''
    函数功能:实现DCGAN的训练过程
    :return:
    '''
    # 准备数据
    data = prepare_data(input_dir, "trump")
    #构建网络结构
    x = tf.placeholder(tf.float32, shape=[None, data_length], name="Input_data")
    x_img = tf.reshape(x, [-1] + data_shape)
    z = tf.placeholder(tf.float32, shape=[None, z_dim], name="latent_var")
    G = Generator(z, is_training=True, reuse=False)
    D_fake_logits = Discriminator(G, is_training=True, reuse=False)
    D_true_logits = Discriminator(x_img, is_training=True, reuse=True)
    #定义生成器的损失函数G_loss
    G_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_fake_logits, labels=tf.ones_like(D_fake_logits)))
    #定义判别器的损失函数D_loss
    D_loss_1 = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_true_logits, labels=tf.ones_like(D_true_logits)))
    D_loss_2 = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_fake_logits, labels=tf.zeros_like(D_fake_logits)))
    D_loss = D_loss_1 + D_loss_2
    #　定义方差
    total_vars = tf.trainable_variables()
    d_vars = [var for var in total_vars if "d_" in var.name]
    g_vars = [var for var in total_vars if "g_" in var.name]
    # 定义优化方式
    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
        g_optimization = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1=beta1).minimize(G_loss, var_list=g_vars)
        d_optimization = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1=beta1).minimize(D_loss, var_list=d_vars)
    print("Network Build Success!")
    #训练模型初始化
    start_time = time.time()
    sess = tf.Session()
    sess.run(tf.initialize_all_variables())
    #逐个epoch进行训练
    for i in range(epoch):
        total_batch = int(len(data) / batch_size)
        d_value = 0
        g_value = 0
        #逐个batch训练
        for j in range(total_batch):
            batch_xs = data[j*batch_size : j*batch_size + batch_size]
            #训练判别器
            z_sampled1 = np.random.uniform(low=-1.0, high=1.0, size=[batch_size, z_dim])
            Op_d, d_ = sess.run([d_optimization, D_loss], feed_dict={x: batch_xs, z:z_sampled1})
            #训练生成器
            z_sampled2 = np.random.uniform(low=-1.0, high=1.0, size=[batch_size, z_dim])
            Op_g, g_ = sess.run([g_optimization, G_loss], feed_dict={x: batch_xs, z: z_sampled2})
            #尝试生成影像并保存
            images_generated = sess.run(G, feed_dict={z: z_sampled2})
            d_value += d_ / total_batch
            g_value += g_ / total_batch
            plot_and_save(i, images_generated)
            #输出时间和损失函数loss
            hour = int((time.time() - start_time) / 3600)
            min = int(((time.time() - start_time) - 3600*hour) / 60)
            sec = int((time.time() - start_time) - 3600*hour - 60 *min)
            print("Time: ", hour, "h: ", min, "min", sec, "sec", "   Epoch: ", i, "G_loss: ", g_value, "D_loss: ", d_value)

if __name__ == '__main__':
    training()

运行的话直接修改一下人脸所在的目录，使用python3 DCGAN.py 即可

参考文章

https://blog.csdn.net/stdcoutzyx/article/details/53872121
https://blog.csdn.net/xiening0618/article/details/79417734
https://blog.csdn.net/z704630835/article/details/82254193

DCGAN 论文阅读及使用DCGAN生成人脸

论文题目

摘要

1.介绍

2.相关工作

2.1 无监督表征学习

2.2 生成自然图像

2.3 卷积神经网络可视化

3 方法和模型架构

4 训练细节

4.1 LSUN

4.1.1 重复数据删除

4.2 FACES

4.3 IMAGENET-1K

5 DCGAN能力的经验确认

5.1 CIFAR-10 上使用GAN作为特征提取器进行分类

5.2 将GAN作为特征提取器执行SVHN DIGITS分类

6. 调整和可视化CNN网络的内部

6.1 漫游隐空间

6.2 判别器特征可视化

6.3 MANIPULATING THE G ENERATOR R EPRESENTATION

6.3.1 忘记画某些目标

6.3.2 VECTOR ARITHMETIC ON FACE SAMPLES

7. 结论

使用DCGAN 生成人脸

参考文章