介绍
Google Inception Net首次出现在ILSVRC 2014的比赛中(和VGGNet同年),就以较大优势取得第一名。那届比赛中的Inception Net通常称为Inception V1,它最大的特点是控制了计算量和参数量的同时,获得了非常好的分类性能-top-5错误率6.67%。,只有AlexNet的一半不到。Inception V1有22层,比AlexNet的8层或者VGGNet的19层还要深。但其计算量只有15亿次浮点运算,同时只有500万参数量,仅为AlexNet参数量(6000万)的1/12,却可以达到远胜于AlexNet的准确率,可以说是非常优秀且实用的模型。Inception V1降低参数量的目的主要有2点,第一,参数越多模型越庞大,需要提供模型学习的数据量就越大,而目前高质量的数据十分昂贵。第二,参数越多,耗费的计算资源也会更大。Inception V1参数少但效果好的原因除了模型层数更多,表达能力更强外,还有2点:一是去除了最后的全连接层,用全局平均池化层来取代它。全连接层几乎占据了AlexNet或VGGNet中90%的参数量,而且会引起过拟合,去除全连接层后模型训练更快且减轻了过拟合。用全局平均池化层取代全连接层的做法借鉴了Network In Network论文。而是Inception V1中精心设计的Inception Module提高了参数的利用效率。
我们来看一下Inception Module的基本结构,其中有4个分支,第一个分支对输入进行1 × 1卷积,这其实也是NIN种提出的一个重要结构。1 × 1的卷积是一个非常优秀的结构,它可以跨通道组织信息,提高网络的表达能力,同时可以对输出通道升维和降维。可以看到Inception Module的4个分支都用到了1 × 1卷积,来降低成本(计算量比3 × 3小很多)的跨通道的特征变换。第二个分支使用了1 × 1卷积,然后连接 3 × 3卷积,相当于进行了两次特征变换。第3个分支类似,先是1 × 1卷积,然后连接 5 × 5卷积。最后一个分支则是 3 × 3最大池化后直接使用1 × 1卷积。我们可以发现,有的分支只使用1 × 1卷积,有的分支使用了其他尺寸的卷积时也会再用1 × 1卷积,这是因为1 × 1卷积的性价比很高,用很小的计算量就可以增加一层特征变换和非线性化。Inception Module的4个分支在最后通过一个聚合操作合并(在输出通道这个维度上聚合)。Inception Module中包含了3种不同尺寸的卷积和1个最大池化,增加了网络对不同尺度的适应性,这一部分和Multi-Scale的思想类似。早期计算机是觉得研究中,收到灵长类神经视觉系统的启发,Serre使用不同尺寸的Gabor滤波器处理不同尺寸的图片,Inception V1借鉴了这种思想。Inception V1的论文中指出,Inception Module可以让网络的深度和宽度高效率的扩充,提升准确率且不至于过拟合。
在Inception Module中,通常1 × 1卷积的比例(输出通道数占比)最高,3 × 3卷积核5 × 5卷积稍低。而在整个网络中,会有多个堆叠的Inception Module,我们希望靠后的Inception Module可以捕捉更高阶的抽象特征,因此靠后的Inception Module的卷积的空间集中度应该组件降低,这样可以捕获更大面积的特征。因此,越靠后的Inception Module中,3 × 3和5 × 5这两个大面积的卷积核的占比(输出通道数)应该更多。
Inception V2学习了VGGNet,用两个3 × 3的卷积代替 5 × 5的大卷积核(用以降低参数量并减轻过拟合),还提出了著名的Batch Normalization方法。BN是一个非常有效的正则化方法,可以让大型卷积网络的速度加快很多倍,同时收敛后的分类准确率也可以得到大幅度的提高。BN在用于神经网络某层时,会对每一个mini-batch数据的内存进行标准化处理,使输出规范化到N(0,1)的正太分布,减少了Internal Covariate Shift(内部神经元分布的改变)。BN的论文指出,传统的深度卷积网络在训练时,每一层输入的分布都在变化,导致训练变得困难,我们只能使用一个很小的学习速率解决这个问题。而对每一层使用BN之后,我们就可以有效地解决这个问题,学习速率可以增大很多倍,达到之前的准确率所需要的迭代次数只有1/14,训练时间大大缩短。而达到之前的准确率后可以继续训练,并最终取得远超于Inception V1模型的性能–top-5错误率4.8%,已经优于人眼水平。因为BN某种意义上还起到了正则化的作用,所以可以减少或者取消Dropout,简化网络结构。
当然只是单纯的使用BN获得的增益还不明显,还需要一些相应的调整:增大学习速率并加快学习衰减速度以适用BN规范化后的数据;去除Dropout并减轻L2正则(因为BN已经起到正则化的作用);去除LRN;更彻底的对训练样本进行shuffle;减少数据增强过程中对数据的光学畸变(因为BN训练更快,每个样本被训练的次数更少,因此更真实的样本对训练更有帮助)。在使用了这些措施以后,Inception V2在训练达到Inception V1的准确率时快了14倍,并且模型的收敛时的上限更高。
而Inception V3网络主要有两方面的改造:一是引入了Factorization into small convolutions的思想,将一个较大的二维卷积拆成两个较小的一维卷积,比如将7 × 7卷积拆成1 × 7卷积和7 × 1卷积,比拆成3个3 × 3卷积更节约参数,同时增加了一层非线性扩展模型表达能力。论文中指出,这种非对称的卷积结构拆分,其结果比对称的拆成几个相同的小卷积核效果更明显,可以处理更多,更丰富的空间特征,增加特征多样性。
另一方面,Inception V3优化了Inception Module的结构,现在Inception Module有35 × 35, 17 × 17和8 × 8三种不同结构。这些Inception Module只在网络中的后部出现,前部还是普通的卷积层。并且Inception V3除了在Inception Module中使用分支,还在分支中使用了分支(8 × 8结构中),可以说是Network In Network。
而Inception V4相比于V3主要是结合了微软的ResNet,而ResNet将在6.4节单独讲解,这里不多做追溯。接下来就来实现以下Inception V3,Inception V3的网络结构如下:
由于Google Inception Net V3相对比较复杂,所以这里用tf.contrib.slim辅助设计这个网络。contrib.slim中的一些功能和组件可以大大减少设计Inception Net的代码量,我们只需要少量的代码即可构件好42层深的Inception V3。
- 首先定义一个简单的函数trunc_normal,产生截断的正态分布。
#coding=utf-8
#Inception-V3.py
import tensorflow as tf
from datetime import datetime
import math
import time
slim = tf.contrib.slim
#产生截断的正太分布
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)
- 下面定义函数inception_v3_arg_scope,用来生成网络中经常用到的函数的默认参数,比如卷积的激活函数,权重初始化方式,标准化器等。设置L2正则的weight_decay默认值为0.00004,标准差stddev默认为0.1,参数batch_norm_var_collection默认值为moving_vars。接下来,定义batch normallization的参数字典,定义其衰减系数decay为0.9997,epsilon为0.001,updates_collctions为tf.GraphKeys.UPDATE_OPS,然后字典variable_collections中beta和gamma均设置为None,moving_mean和moving_variance均设置为前面的batch_norm_var_collection。
接下来使用slim.arg_scope,这是一个非常有用的工具,它可以给函数的参数自动赋予某些值。例如,这句with slim.arg_scope([slim.conv2d,slim.fully_connected],weights_regularizer=slim.l2_regularizer(weight_decay)),会对[slim.conv2d,slim.fully_connected]这两个函数的参数自动赋值,将参数weights_regularizer的值默认设为slim.l2_regularizer(weight_decay)。使用了slim.arg_scope后就不需要每次都重复设置参数了,只需要在有修改时设置。接下来,嵌套一个slim.arg_scope,对卷积层生成函数slim.conv2d的几个参数赋予默认值,其权重初始化器weights_initializer设置为trunc_normal(stddev),激活函数设置为ReLU,标准化器设置为slim.batch_norm,标准化器的参数设置为前面定义的batch_norm_params。最后返回定义好的scope。
因为事先定义好了slim.conv2d中的各种默认参数,包括激活函数和标准化器,因此后面定义一个卷积层将会变得非常方便。我们可以用一行代码定义一个卷积层,整体代码会变得非常简洁美观,同时设计网络的工作量也会大大减轻。
#定义inception_v3_arg_scope,用来生成网络中经常用到的函数的默认参数
def inception_v3_arg_scope(weight_decay=0.00004, stddev=0.1, batch_norm_var_collection='moving_vars'):
batch_norm_params = {
'decay': 0.9997,
'epsilon': 0.001,
'updates_collections': tf.GraphKeys.UPDATE_OPS,
'variables_collections':{
'beta' : None,
'gamma' : None,
'moving_mean': [batch_norm_var_collection],
'moving_variance': [batch_norm_var_collection]
}
}
with slim.arg_scope([slim.conv2d, slim.fully_connected], weights_regularizer=slim.l2_regularizer(weight_decay)):
with slim.arg_scope([slim.conv2d], weights_initializer=tf.truncated_normal_initializer(stddev=stddev),
activation_fn=tf.nn.relu,
normalizer_fn=slim.batch_norm,
normalizer_params=batch_norm_params) as sc:
return sc
- 接下来我们就定义函数inception_v3_base,它可以生成Inception V3网络的卷积部分,参数inputs为输入的图片数据的tensor,scope为包含了函数默认参数的环境。我们定义一个字典表end_points,用来保存某些关键节点供以后使用。接着再使用slim.arg_scope,对slim.conv2d,slim.max_pool2d这3个函数的参数设置默认值,将stride设为1,padding设为VALID。下面正式开始定义Inception V3的网络结构,首先是前面的非Inception Module的卷积层。这里直接使用slim.conv2d创建卷积层,slim.conv2d的第一参数为输入的tensor,第二个参数为输出的通道数,第3个参数为卷积核尺寸,第4个参数为步长stride,第5个参数为padding模式。我们的第一个卷积层后的输出通道数为32,卷积核尺寸为3 × 3,步长为2,padding模式则是默认的VALID。后面的几个卷积层采用相同的形式,按照论文中的定义,逐层定义好网络结构。因为使用了slim及slim.arg_scope,我们一行代码就可以定义好一个卷积层,相比之前AlexNet的实现中使用好几行代码定义一个卷积层,或是VGGNet中专门写一个函数来定义卷积层,都更加方便。
我们可以观察到,在前面的几个普通非Inception Module的卷积层中,主要使用了3 × 3的小卷积核,这是充分借鉴了VGGNet的结构。同时,Inception V3论文中也提出了Factorization into small convolutions思想,利用两个一维卷积模拟大尺寸的2维卷积,减少参数量的同时增加非线性。前面几层卷积中还有一层1 × 1卷积,这也是前面提到的Inception Module中经常使用的结构之一,可低成本的跨通道的对特征进行组合。另外可以看到,除了第一个卷积层步长为2,其余的卷积层步长均为1,而池化层则是尺寸为3 × 3,步长为2的重叠最大池化,这是AlexNet中使用过得结构。网络的输入数据尺寸为299 × 299 × 3,在经历3个步长为2的层后,尺寸缩小为35 × 35 × 192,空间尺寸大大降低,但是输出通道增加了很多。这部分代码一共有5个卷积层,2个池化层,实现了对输入数据的尺寸压缩,并对图片特征进行了抽象。
def inception_v3_base(inputs, scope=None):
end_points = {}
with tf.variable_scope(scope, 'InceptionV3', [inputs]):
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='VALID'):
net = slim.conv2d(inputs, 32, [3, 3], stride=2, scope='Conv2d_1a_3x3')
net = slim.conv2d(net, 32, [3, 3], scope='Conv2d_2a_3x3')
net = slim.conv2d(net, 64, [3, 3], padding='SAME', scope='Conv2d_2b_3x3')
net = slim.max_pool2d(net, [3, 3], stride=2, scope='MaxPool_3a_3x3')
net = slim.conv2d(net, 80, [1, 1], scope='Conv2d_3b_1x1')
net = slim.conv2d(net, 192, [3, 3], scope='Conv2d_4a_3x3')
net = slim.max_pool2d(net, [3, 3], stride=2, scope='MaxPool_5a_3x3')
- 接下来就将是3个连续的Inception模块组,这3个Inception模块组中各自分别有多个Inception Module,这部分的网络结构即是Inception V3的精华所在。每个Inception 模块组内部的几个Inception Module结构非常类似,但存在一些细节不同。
第一个Inception模块组包含了3个结构类似的Inception Module。其中第一个Inception Module的名称为Mixed_5b。我们先使用tf.slim.arg_scope设置所有Inception模块组的默认参数,将所有卷积层,最大池化,平均池化层的步长设为1,padding模式设为SAME。然后设置这个Inception Module的variable_scope名称为Mixed_5b。这个Inception Module中有4个分支,从Branch_0到Branch_3,第一个分支为有64输出通道的1 × 1卷积;第二个分支为有48输出通道的1 × 1卷积,连接有64输出通道的5 × 5卷积;第三个分支为有64输出通道的1 × 1卷积,再连续连接2个有96输出通道的3 × 3卷积;第4个分支为3 × 3的平均池化,连接有32输出通道的1 × 1卷积。最后,使用tf.concat将4个分支的输出合并在一起(在第3个维度合并,即输出通道上合并),生成这个Inception Module的最终输出。因为这里所有的层步长均为1,并且padding模式为SAME,所以图片的尺寸并不会缩小,依然维持在35 × 35 × 256。这里需要注意,第一个Inception模块组中所有的Inception Module输出的图片尺寸为35 × 35,但是后两个Inception Module的通道数会发生变化。
#Inception Module
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='SAME'):
# 第一个Inception Module的第一个模块
with tf.variable_scope('Mixed_5b'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 64, [5, 5], scope='Conv2d_0b_5x5')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 32, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
- 接下来是第一个Inception模块组的第二个Inceptiopn Module—Mixed_5c,这里依然使用前面设置的默认参数:步长为1,padding模式为SAME。这个Inception Module同样有4个分支,唯一不同的是第4个分支最后接的是64输出通道的1 × 1卷积,而此前是32输出通道。因此,我们输出tensor的最终尺寸为35 × 35 × 288,输出通道数相比之前增加了32。
# 第一个Inception Module的第2个模块
with tf.variable_scope('Mixed_5c'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0b_1x1')
branch_1 = slim.conv2d(branch_1, 64, [5, 5], scope='Conv2d_0c_5x5')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_1x1')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
- 而第一个Inception模块组的第3个Inception Module----Mixed_5d和上一个Inception Module完全相同,4个分支的结构,参数一模一样,输出tensor的尺寸也一样。
#第一个Inception Module的第3个模块
with tf.variable_scope('Mixed_5d'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 64, [5, 5], scope='Conv2d_0b_5x5')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
- 第2个Inception模块组是一个非常大的模块组,包含了5个Inception Module,其中第二个到第5个Inception Module的结构非常类似,其中第一个Inception Module名称为Mixed)6a,它包含3个分支。第一个分支是一个384输出通道的3 × 3卷积,这个分支的通道数一下就超过了之前的通道数之和。不过步长为2,因此图片的尺寸将被压缩,且padding的模式为VALID,所以图片的尺寸缩小为17 × 17;第二个分支有3层,分别是一个64输出通道的1 × 1卷积和两个96输出通道的3 × 3卷积。这里需要注意,最后一层的步长为2,padding模式为VALID,因此图片尺寸也被压缩,本分支最终输出的tensor尺寸为17 × 17 × 96;第3个分支是一个3 × 3最大池化层,步长同样为2,padding模式为VALID,因此输出的tensor尺寸为17 × 17 × 256.最后依然是使用tf.concat将3个分支在输出通道上合并,最后的输出尺寸为17 × 17 × (384 + 96 + 256)=17 × 17 × 768。在第2个Inception模块组中,5个Inception Module输出tensor的尺寸全部定格在17 × 17 × 768, 即图片的尺寸和通道数都没有发生变化。
#第二个Inception Module的第一个模块
with tf.variable_scope('Mixed_6a'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 384, [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 96, [3, 3], scope='Conv2d_1a_1x1')
branch_1 = slim.conv2d(branch_1, 96, [3, 3], stride=2, padding='VALID', scope='Conv2d_1b_1x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3')
net = tf.concat([branch_0, branch_1, branch_2], 3)
- 接下来是第2个Inception 模块组的第2个Inception Module–Mixed_6b,它有4个分支。第一个分支是一个简单的192输出通道的1 × 1卷积;第2个分支是由3个卷积层组成,第一层时128输出通道的1 × 1卷积,第二层是128输出通道数的1 × 7卷积,第3层时192输出通道的7 × 1卷积。这里既是前面提到的Factorization into small convolutions思想,串联的1 × 7卷积和7 × 1卷积相当于合成了7 × 7卷积,不过参数量大大减少了,只有后者的2/7,且增加了一个激活函数增强了非线性特征变换;第3个分支一下子拥有5个卷积层,分别是128输出通道的1 × 1卷积,128输出通道的7 × 1卷积,128输出通道的1 × 7卷积,128输出通道的7 × 1卷积和192输出通道的1 × 7卷积。这个分支可以算是利用Factorization into small convolutions的典范,反复的将7 × 7卷积进行拆分;最后,第4个分支是一个3 × 3的平均池化层,再连接192输出通道的1 × 1卷积。最后将4个分支合并,这一层输出tensor的尺寸为17 × 17 × (192 + 192 + 192 + 192) = 17 × 17 × 768。
#第二个Inception Module的第二个模块
with tf.variable_scope('Mixed_6b'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 128, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 128, [7, 1], scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, 128, [1, 7], scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, 128, [7, 1], scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
- 然后是我们第2个Inception模块组的第3个Inception Module—Mixed_6c。Mixed_6c和前面一个Inception Module非常相似,只有一个地方不同,即第二个分支和第3个分支中前几个卷积层的输出通道不同,从128变成了160,但是这两个分支的输出通道不变,都是192.其余地方则完全一致。需要注意的是,我们的网络没经过一个Inception Module,即是输出的tensor尺寸不变,但是特征都相当于被重新精炼了一遍,其中丰富的卷积核非线性池化对网络的提升非常大。
#第二个Inception Module的第三个模块
with tf.variable_scope('Mixed_6c'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 160, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, 160, [1, 7], scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
- Mixed_6d和前面的Mixed_6c完全一致,目的同样是通过Inception Module精心设计的结构增加卷积核非线性,提炼特征。
#第而个Inception Module的第四个模块
with tf.variable_scope('Mixed_6d'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 160, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, 160, [1, 7], scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
- Mixed_6e也和前面2个Inception Module完全一致。这是第2个Inception模块组的最后一个Inception Module。我们将Mixed_6e存储于end_points中,作为Auxiliary Classifier辅助模型的分类。
#第二个Inception Module的第五个模块,且需要把Mixed_6e存储在end_points中,用于辅助模型的分类
with tf.variable_scope('Mixed_6e'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 160, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, 160, [1, 7], scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
end_points['Mixed_6e'] = net
- 第3个Inception模块组包含了3个Inception Module,其中后两个Inception Module的结构非常相似。其中第一个Inception Module的名称为Mixed_7a,包含了3个分支。第一个分支是192输出通道的1 × 1卷积,再接320输出通道的3 × 3卷积,不过步长为2,padding模式为VALID,因此图片尺寸缩小为8 × 8;第二个分支有4个卷积层,分别是192输出通道的1 × 1卷积,192输出通道的1 × 7卷积,192输出通道的7 × 1卷积,以及192输出通道的3 × 3卷积。注意最后一个卷积层同样步长为2,padding模式为VALID,因此最后输出的tensor尺寸为8 × 8 × 192;第3个分支则是一个3 × 3的最大池化层,步长为2,padding模式为VALID,而池化层不会对输出通道产生改变,因此这个分支的输出尺寸为8 × 8 × 768.最后,我们将3个分支在输出通道上合并,输出tensor的尺寸为8 × 8 × (320 + 192 + 768) = 8 × 8 × 1280.从这个Inception Module开始,输出图片的尺寸又被缩小了,同时通道数也增加了,tensor的总size在持续下降中。
#第3个Inception Module的第一个模块
with tf.variable_scope('Mixed_7a'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_0 = slim.conv2d(branch_0, 320, [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 192, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
branch_1 = slim.conv2d(branch_1, 192, [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3')
net = tf.concat([branch_0, branch_1, branch_2], 3)
- 接下来是第3个Inception模块组的第二个Inception Module,它有4个分支。第一个分支是一个简单的320输出通道的1 × 1卷积;第二个分支先是1个384输出通道的1 × 1卷积,随后在分支内开了2个分支,这两个分支分别是384输出通道的1 × 3卷积核384输出通道的3 × 1卷积,然后使用tf.conca合并2个分支,得到的输出tensor的尺寸为8 × 8 × (384 + 384)=8 × 8 × 768;第3个分支更复杂,先是448输出通道的1 × 1卷积,然后是384输出通道的3 × 3卷积,然后同样在分支内拆成2个分支,分别是384输出通道的1 × 3卷积核384输出通道的3 × 3卷积,最后合并得到8 × 8 × 768的输出tensor;第4个分支是在一个3 × 3的平均池化层后接一个192输出通道的1 × 1卷积。最后,将这个非常复杂的Inception Module的4个分支合并在一起,得到的输出tensor的尺寸为8 × 8 × (320 + 768 + 768 + 192)=8 × 8 × 2048.到这个Inception Module,输出通道数从1280增加到了2048。
#第3个Inception Module的第二个模块
with tf.variable_scope('Mixed_7b'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 320, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 384, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = tf.concat([slim.conv2d(branch_1, 384, [1, 3], scope='Conv2d_0b_1x3'),
slim.conv2d(branch_1, 384, [3, 1], scope='Conv2d_0b_3x1')], 3)
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 448, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 384, [3, 3], scope='Conv2d_0b_3x3')
branch_2 = tf.concat([slim.conv2d(branch_2, 384, [1, 3], scope='Conv2d_0c_1x3'),
slim.conv2d(branch_2, 384, [3, 1], scope='Conv2d_0d_3x1')], 3)
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
- Mixed_7c是第3个Inception模块组的最后一个Inception Module,不过它和前面的Mixed_7b是完全一致的,输出tensor也是8 × 8 × 2048。最后,我们返回这个Inception Module的结果,作为inception_v3_base函数的最终输出。
第3个Inception Module的地三个模块
with tf.variable_scope('Mixed_7c'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 320, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 384, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = tf.concat([slim.conv2d(branch_1, 384, [1, 3], scope='Conv2d_0b_1x3'),
slim.conv2d(branch_1, 384, [3, 1], scope='Conv2d_0b_3x1')], 3)
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 448, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 384, [3, 3], scope='Conv2d_0b_3x3')
branch_2 = tf.concat([slim.conv2d(branch_2, 384, [1, 3], scope='Conv2d_0c_1x3'),
slim.conv2d(branch_2, 384, [3, 1], scope='Conv2d_0d_3x1')], 3)
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
return net, end_points
- 至此,Inception V3网络的核心部分,即卷积层部分就完成了。接下来我们来实现Inception V3网络的最后 一部分,全局平均池化,Softmax和Auxiliary Logits。先看函数inception_v3的输入参数,num_classes即最后需要分类的数量,这里默认的1000是ILSVRC比赛数据集的种类数;is_training标志是否是训练过程,对Batch Normalization和Dropout有影响,只有在训练时Batch Normalization和Dropout才会被启用;dropout_keep_prob即训练时Dropout所需保留节点的比例,默认为0.8;prediction_fn是最后用来分类的函数,这里默认是使用slim.softmax;spatial_squeeze参数标志是否对输出进行squeeze操作,即去除维数为1的维度。reuse标志是否会对网络和Variable进行重复使用;最后scope为包含了函数默认参数的环境。首先使用tf.variable_scope定义网络的name和reuse等参数的默认值,然后使用slim.arg_scope定义Batch Normalization和Dropout的is_training标志的默认值。最后,使用前面定义好的inception_v3_base构筑整个网络的卷积部分,拿到最后一层的输出net和重要节点的字典表end_points。
#定义一个inception_v3函数,实现全局平均池化,Softmax和Auxiliary Logits.
def inception_v3(inputs, num_classes=1000, is_training=True, dropout_keep_prob=0.8, prediction_fn=slim.softmax,
sptial_squeeze=True, reuse=None, scope='InceptionV3'):
with tf.variable_scope(scope, 'InceptionV3', [inputs, num_classes], reuse=reuse) as scope:
with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training):
net, end_points = inception_v3_base(inputs, scope=scope)
- 接下来处理Auxiliary Logits这部分的逻辑,Auxiliary Logits作为辅助分类的节点,对分类结果预测有很大帮助。先使用slim.arg_scope将卷积,最大池化,平均池化的默认步长设为1,默认为padding模式为SAME。然后通过end_points取到Mixed_6e,并在Mixed_6e之后再接一个5 × 5的平均池化,步长为3,padding模式为VALID,这样输出的尺寸就从17 × 17 × 768变成5 × 5 × 768.接着连接一个128输出通道的1 × 1卷积核一个768输出通道的5 × 5卷积,这里权重初始化方式重设为标准差0.01的正态分布,padding模式设为VALID,输出尺寸变为1 × 1 × 768。然后再连接一个输出通道数为num_classes的1 × 1卷积,不设激活函数和规范化函数,权重初始化方式重设为标准差为0.001的正太分布,这样输出变为了1 × 1 × 1000,。接下啦,使用tf.squeeze函数消除输出tensor中前两个为1的维度。最后将辅助分类节点的输出aux_logits存储到字典表end_points中。
#实现Auxiliary Logits这部分的逻辑
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='SAME'):
aux_logits = end_points['Mixed_6e']
with tf.variable_scope('AuxLogits'):
aux_logits = slim.avg_pool2d(aux_logits, [5, 5], stride=3, padding='VALID', scope='AvgPool_1a_5x5')
aux_logits = slim.conv2d(aux_logits, 128, [1, 1], scope='Conv2d_1b_1x1')
aux_logits = slim.conv2d(aux_logits, 768, [5, 5], weights_initializer=trunc_normal(0.01),
padding='VALID', scope='Conv2d_2a_5x5')
aux_logits = slim.conv2d(aux_logits, num_classes, [1, 1], activation_fn=None, normalizer_fn=None,
weights_initializer=trunc_normal(0.001), scope='Conv2d_2b_1x1')
if sptial_squeeze:
aux_logits = tf.squeeze(aux_logits, [1, 2], name='SpatialSqueeze')
end_points['Aux:ogits'] = aux_logits
- 下面处理正常的分类预测的逻辑。我们直接对Mixed_7e即最后一个卷积层的输出进行一个8 × 8全局平均池化,padding模式为VALID,这样输出tensor的尺寸就变为1 × 1 × 2048.然后连接一个Dropout层,节点保留率为dropout_keep_prob。接着连接一个输出通道为1000的1 × 1卷积,激活函数和规范化函数设为空。下面使用tf.squeeze去除输出tensor中维度为1的维度,再连接一个Softmax对结果进行分类预测。最后返回输出结果logits和包含辅助节点的end_points。
#实现正常分类预测的逻辑
with tf.variable_scope('Logits'):
net = slim.avg_pool2d(net, [8, 8], padding='VALID', scope='AvgPool_1a_8x8')
net = slim.dropout(net, keep_prob=dropout_keep_prob, scope='Dropout_1b')
end_points['PreLogits'] = net
logits = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='Conv2d_1c_1x1')
if sptial_squeeze:
logits = tf.squeeze(logits, [1, 2], name='SpatialSqueeze')
end_points['Logits'] = logits
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
return logits, end_points
- 至此,整个Inception V3网络的构建就完成了。Inception V3是一个非常复杂,精妙的模型,其中用到非常多之前积累下来的设计大型卷积网络的经验和技巧。不过,虽然Inception V3论文中给出了设计卷积网络的几个原则,但是其中很多超参数的选择,包括层数,卷积核的大小,池化的位置,步长的设置,factorization使用的时机,以及分支的设计,都很难一一解释。
下面对Inception V3进行运算性能测试。这里使用time_tensorflow_run函数。因为Inception V3网络结构大,所以令batch_size为32,以免显存不够用。图片尺寸设为299 × 299,并用tf.random.uniform生成随机图片作为input。接着,我们使用slim.arg_scope加载前面定义好的inception_v3_arg_scope(),在这个scope种包含了Batch Normalization的默认参数,以及激活函数和参数初始化方式的默认值。然后,在这个arg_scope下,调用inception_v3函数,并传入inputs,获取logits和end_points。下面创建Session并初始化全部模型参数。最后我们设置测试的batch数量为100,并使用time_tensorflow_run测试Inception V3网络的forward性能。
#定义AlexNet的每轮时间评估函数
def time_tensorflow_run(session, target, info_string):
num_steps_burn_in = 10 #程序预热
total_durations = 0.0
total_duration_squared = 0.0
for i in range(num_batches + num_steps_burn_in):
start_time = time.time()
_ = session.run(target)
duration = time.time() - start_time
if i >= num_steps_burn_in:
if not i % 10:
print('%s: step %d, duration = %.3f'%(datetime.now(), i - num_steps_burn_in, duration))
total_durations += duration
total_duration_squared += duration * duration
#计算每轮迭代的平均耗时和标准差sd,最后将结果显示出来
mn = total_durations / num_batches
vr = total_duration_squared / num_batches - mn * mn
sd = math.sqrt(vr)
print('%s: %s across %d steps, %.3f +/- %.3f sec / batch'%(datetime.now(), info_string, num_batches, mn, sd))
#对Inception-V3 进行运算性能测试
batch_size = 32
height, width = 299, 299
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope(inception_v3_arg_scope()):
logits, end_points = inception_v3(inputs, is_training=False)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
num_batches = 100
time_tensorflow_run(sess, logits, "Foward")
算法实现完整代码
#coding=utf-8
#Inception-V3.py
import tensorflow as tf
from datetime import datetime
import math
import time
slim = tf.contrib.slim
#产生截断的正太分布
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)
#定义inception_v3_arg_scope,用来生成网络中经常用到的函数的默认参数
def inception_v3_arg_scope(weight_decay=0.00004, stddev=0.1, batch_norm_var_collection='moving_vars'):
batch_norm_params = {
'decay': 0.9997,
'epsilon': 0.001,
'updates_collections': tf.GraphKeys.UPDATE_OPS,
'variables_collections':{
'beta' : None,
'gamma' : None,
'moving_mean': [batch_norm_var_collection],
'moving_variance': [batch_norm_var_collection]
}
}
with slim.arg_scope([slim.conv2d, slim.fully_connected], weights_regularizer=slim.l2_regularizer(weight_decay)):
with slim.arg_scope([slim.conv2d], weights_initializer=tf.truncated_normal_initializer(stddev=stddev),
activation_fn=tf.nn.relu,
normalizer_fn=slim.batch_norm,
normalizer_params=batch_norm_params) as sc:
return sc
#定义inception_v3_base生成Inception V3网络的卷积部分(Inception Module之前的卷积池化层)
def inception_v3_base(inputs, scope=None):
end_points = {}
with tf.variable_scope(scope, 'InceptionV3', [inputs]):
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='VALID'):
net = slim.conv2d(inputs, 32, [3, 3], stride=2, scope='Conv2d_1a_3x3')
net = slim.conv2d(net, 32, [3, 3], scope='Conv2d_2a_3x3')
net = slim.conv2d(net, 64, [3, 3], padding='SAME', scope='Conv2d_2b_3x3')
net = slim.max_pool2d(net, [3, 3], stride=2, scope='MaxPool_3a_3x3')
net = slim.conv2d(net, 80, [1, 1], scope='Conv2d_3b_1x1')
net = slim.conv2d(net, 192, [3, 3], scope='Conv2d_4a_3x3')
net = slim.max_pool2d(net, [3, 3], stride=2, scope='MaxPool_5a_3x3')
#Inception Module
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='SAME'):
# 第一个Inception Module的第一个模块
with tf.variable_scope('Mixed_5b'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 64, [5, 5], scope='Conv2d_0b_5x5')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 32, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
# 第一个Inception Module的第2个模块
with tf.variable_scope('Mixed_5c'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0b_1x1')
branch_1 = slim.conv2d(branch_1, 64, [5, 5], scope='Conv2d_0c_5x5')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_1x1')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
#第一个Inception Module的第3个模块
with tf.variable_scope('Mixed_5d'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 64, [5, 5], scope='Conv2d_0b_5x5')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
#第二个Inception Module的第一个模块
with tf.variable_scope('Mixed_6a'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 384, [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 96, [3, 3], scope='Conv2d_1a_1x1')
branch_1 = slim.conv2d(branch_1, 96, [3, 3], stride=2, padding='VALID', scope='Conv2d_1b_1x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3')
net = tf.concat([branch_0, branch_1, branch_2], 3)
#第二个Inception Module的第二个模块
with tf.variable_scope('Mixed_6b'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 128, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 128, [7, 1], scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, 128, [1, 7], scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, 128, [7, 1], scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
#第二个Inception Module的第三个模块
with tf.variable_scope('Mixed_6c'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 160, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, 160, [1, 7], scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
#第而个Inception Module的第四个模块
with tf.variable_scope('Mixed_6d'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 160, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, 160, [1, 7], scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
#第二个Inception Module的第五个模块,且需要把Mixed_6e存储在end_points中,用于辅助模型的分类
with tf.variable_scope('Mixed_6e'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 160, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, 160, [1, 7], scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
end_points['Mixed_6e'] = net
#第3个Inception Module的第一个模块
with tf.variable_scope('Mixed_7a'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_0 = slim.conv2d(branch_0, 320, [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 192, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
branch_1 = slim.conv2d(branch_1, 192, [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3')
net = tf.concat([branch_0, branch_1, branch_2], 3)
#第3个Inception Module的第二个模块
with tf.variable_scope('Mixed_7b'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 320, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 384, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = tf.concat([slim.conv2d(branch_1, 384, [1, 3], scope='Conv2d_0b_1x3'),
slim.conv2d(branch_1, 384, [3, 1], scope='Conv2d_0b_3x1')], 3)
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 448, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 384, [3, 3], scope='Conv2d_0b_3x3')
branch_2 = tf.concat([slim.conv2d(branch_2, 384, [1, 3], scope='Conv2d_0c_1x3'),
slim.conv2d(branch_2, 384, [3, 1], scope='Conv2d_0d_3x1')], 3)
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
#第3个Inception Module的地三个模块
with tf.variable_scope('Mixed_7c'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 320, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 384, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = tf.concat([slim.conv2d(branch_1, 384, [1, 3], scope='Conv2d_0b_1x3'),
slim.conv2d(branch_1, 384, [3, 1], scope='Conv2d_0b_3x1')], 3)
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 448, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 384, [3, 3], scope='Conv2d_0b_3x3')
branch_2 = tf.concat([slim.conv2d(branch_2, 384, [1, 3], scope='Conv2d_0c_1x3'),
slim.conv2d(branch_2, 384, [3, 1], scope='Conv2d_0d_3x1')], 3)
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
return net, end_points
#定义一个inception_v3函数,实现全局平均池化,Softmax和Auxiliary Logits.
def inception_v3(inputs, num_classes=1000, is_training=True, dropout_keep_prob=0.8, prediction_fn=slim.softmax,
sptial_squeeze=True, reuse=None, scope='InceptionV3'):
with tf.variable_scope(scope, 'InceptionV3', [inputs, num_classes], reuse=reuse) as scope:
with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training):
net, end_points = inception_v3_base(inputs, scope=scope)
#实现Auxiliary Logits这部分的逻辑
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='SAME'):
aux_logits = end_points['Mixed_6e']
with tf.variable_scope('AuxLogits'):
aux_logits = slim.avg_pool2d(aux_logits, [5, 5], stride=3, padding='VALID', scope='AvgPool_1a_5x5')
aux_logits = slim.conv2d(aux_logits, 128, [1, 1], scope='Conv2d_1b_1x1')
aux_logits = slim.conv2d(aux_logits, 768, [5, 5], weights_initializer=trunc_normal(0.01),
padding='VALID', scope='Conv2d_2a_5x5')
aux_logits = slim.conv2d(aux_logits, num_classes, [1, 1], activation_fn=None, normalizer_fn=None,
weights_initializer=trunc_normal(0.001), scope='Conv2d_2b_1x1')
if sptial_squeeze:
aux_logits = tf.squeeze(aux_logits, [1, 2], name='SpatialSqueeze')
end_points['Aux:ogits'] = aux_logits
#实现正常分类预测的逻辑
with tf.variable_scope('Logits'):
net = slim.avg_pool2d(net, [8, 8], padding='VALID', scope='AvgPool_1a_8x8')
net = slim.dropout(net, keep_prob=dropout_keep_prob, scope='Dropout_1b')
end_points['PreLogits'] = net
logits = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='Conv2d_1c_1x1')
if sptial_squeeze:
logits = tf.squeeze(logits, [1, 2], name='SpatialSqueeze')
end_points['Logits'] = logits
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
return logits, end_points
#定义AlexNet的每轮时间评估函数
def time_tensorflow_run(session, target, info_string):
num_steps_burn_in = 10 #程序预热
total_durations = 0.0
total_duration_squared = 0.0
for i in range(num_batches + num_steps_burn_in):
start_time = time.time()
_ = session.run(target)
duration = time.time() - start_time
if i >= num_steps_burn_in:
if not i % 10:
print('%s: step %d, duration = %.3f'%(datetime.now(), i - num_steps_burn_in, duration))
total_durations += duration
total_duration_squared += duration * duration
#计算每轮迭代的平均耗时和标准差sd,最后将结果显示出来
mn = total_durations / num_batches
vr = total_duration_squared / num_batches - mn * mn
sd = math.sqrt(vr)
print('%s: %s across %d steps, %.3f +/- %.3f sec / batch'%(datetime.now(), info_string, num_batches, mn, sd))
#对Inception-V3 进行运算性能测试
batch_size = 32
height, width = 299, 299
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope(inception_v3_arg_scope()):
logits, end_points = inception_v3(inputs, is_training=False)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
num_batches = 100
time_tensorflow_run(sess, logits, "Foward")