质量声明:原创文章,内容质量问题请评论吐槽。如对您产生干扰,可私信删除。
主要参考:李沐等:动手学深度学习-伯克利教材; GluonCV官网
文章目录
摘要: MXNet实践: 以迁移学习的方法,微调 ResNet 进行12品种的猫分类.
迁移学习
虽然深度网络适用的数据集并不相同,但模型抽取的图像特征都是比较通用的,如边缘、纹理、形状和物体组成等,这些类似的特征对于其他场景或者目标可能也同样有效。迁移学习(transfer learning),就是将从源数据集学到的知识迁移到⽬标数据集上。微调(fine tuning)是迁移学习中的⼀种常⽤技术。假设模型在源数据集上学到的知识同样适用于目标数据集,且源模型输出层与源数据集标签紧密相关,从头训练输出层,而其余层参数由源模型参数微调得到
。当目标数据集远小于源数据集时,微调有助于提升模型的泛化能力
。微调步骤如下:
- 在源数据集(如ImageNet数据集)上预训练一个源模型
- 通过复制源模型的结构和参数(除了输出层)创建目标模型
- 为目标模型添加类别匹配的输出层,并随机初始化该层的模型参数
- 在目标数据集上训练目标模型
gluonCV
推荐阅读GluonCV的缘来和简介:李沐:GluonCV — 计算机视觉的深度学习工具包
- GluonCV项目网站 https://gluon-cv.mxnet.io/,Github仓库 https://github.com/dmlc/gluon-cv
- 支持应用有:
- 图像分类 Image Classification
- 目标检测 Object Detection
- 语义分割 Semantic Segmentation
- 实例分割 Instance Segmentation
- 姿势估计 Pose Estimation
- 视频动作识别 Video Action Recognition
- 图像分类 Image Classification
- 预训练的源模型实例含有两个成员变量,即
features
包含模型除输出层以外的所有层,output
为模型的输出层,这样划分主要是为了方便微调除输出层以外所有层的模型参数 - 由于features中的模型参数被初始化为在源数据集上预训练得到的参数,已经足够好,因此一般只需使用较小的学习率
𝜂
来微调;而output中的模型参数是随机初始化,一般需要更大的学习率10𝜂
从头训练
项目实践:预训练模型直接预测图像分类
from mxnet import nd, image
from gluoncv.data.transforms.presets.imagenet import transform_eval
from gluoncv.model_zoo import get_model
from matplotlib import pyplot as plt
# 读取图像
src = image.imread("data/dog.png")
# 图像变换
img = transform_eval(src)
# 显示图像
plt.subplot(121), plt.imshow(src.asnumpy()), plt.axis("off"), plt.title(src.shape)
plt.subplot(122), plt.imshow(nd.transpose(img[0], (1,2,0)).asnumpy()), plt.axis("off"), plt.title(img.shape)
plt.show()
# 导入模型
net = get_model("ResNet50_v2", pretrained=True)
# 预测
pred = net(img);
assert pred.shape == (1, 1000)
# 输出前5项预测结果
top_k = 5
print("The input picture is classified to be")
for i in range(len(img)):
pred_ = pred[i]
index = nd.topk(pred, k=top_k).asnumpy().astype("int").flatten()
for j in range(top_k):
class_ = net.classes[index[j]]
prob = pred_[index[j]].asscalar()
print(f"{class_:25} probability = {prob:.3f}")
可以看出, 预测的前五项都属于狗类: Welsh springer spaniel 威尔士史宾格犬; Brittany spaniel 布列塔尼西班牙猎狗; cocker spaniel 可卡犬; Blenheim spaniel 布伦海姆西班牙猎狗; clumber 矮脚西班牙猎犬
项目实践: 微调预训练模型进行猫咪分类
数据集
由于目前缺少算力支持, 不能实践大型数据集. GluonCV官网的示例程序用的是 MINC-2500
, 但是也有2.6GB, 没有GPU只能放弃…找了好久, 看到 百度AI 练习项目: 猫12分类问题, 该项目已经完结, 但数据集可以下载, 这里我只下载了训练集, 因为测试集没有标签. 其中包含12种类的猫的图片, 每类猫咪180张图像, 合计184MB, 感觉挺适合CPU训练的. 按如下程序进行数据集分割, 一分为二进行迁移学习.
import os, shutil
import pandas as pd
import numpy as np
np.random.seed(42)
# 获取当前路径
root = os.path.abspath('.')
src_path = os.path.join(root, 'images')
# 新建目录
dir_names = ["train", "test"]
for dir_name in dir_names:
path = os.path.join(root, dir_name)
if not os.path.isdir(path):
os.mkdir(path)
print(f"创建目录: {dir_name}")
# 读取标签文件
table = pd.read_csv("labels.txt", sep="\t", header=None, names=["filename", "label"])
print(table.label.value_counts())
# 按标签创建子目录
classes = 12
for dir_name in dir_names:
for i in range(classes):
path = os.path.join(root, dir_name, str(i))
if not os.path.isdir(path):
os.mkdir(path)
print(f"创建目录: {path}")
# 划分训练集和测试集
labels = np.array(table.label)
description = []
for label in labels:
stat = "train" if np.random.rand() < 0.7 else "test"
description.append(stat)
description = pd.Series(description)
table["description"] = description
print(table.head(10))
# 按标签索引将图片复制到指定目录
table = np.array(table)
for data in table:
file_name = data[0].split("/")[1]
file_path = os.path.join(root, file_name)
dst_path = os.path.join(root, data[2], str(data[1]))
if os.path.isfile(os.path.join(dst_path, file_name)): # 目标路径下文件已存在
os.remove(file_path )
if os.path.isfile(file_path):
shutil.move(file_path, dst_path)
print("数据集划分成功!")
在download目录查看前后的文件变化: tree -L 2
, 就是将图像根据标签归入对应的子文件夹 (右侧是原始文件树)
微调代码示例: ResNet50_v2
import os, numpy as np
import mxnet as mx
from mxnet import image, init, nd, autograd, gluon
from mxnet.gluon import nn
from mxnet.gluon.data.vision import ImageFolderDataset, transforms
from gluoncv.model_zoo import get_model
from time import time
from matplotlib import pyplot as plt
# 超参数
classes = 12
wd = 0.0001
epochs = 20
lr = 0.001
lr_period = 5
lr_decay = 0.5
ctx = mx.gpu()
batch_size = 64
num_workers = 8
# 读入数据集
dataset_dir = "./datasets/cat12"
trainset = ImageFolderDataset(os.path.join(dataset_dir, 'train'))
testset = ImageFolderDataset(os.path.join(dataset_dir, 'test'))
# 图像增广
jitter_param = 0.4
lighting_param = 0.1
transform_train = transforms.Compose([
transforms.Resize(256),
transforms.RandomFlipLeftRight(),
transforms.RandomColorJitter(brightness=jitter_param, contrast=jitter_param,
saturation=jitter_param),
transforms.RandomLighting(lighting_param),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
transform_test = transforms.Compose([
transforms.Resize(256),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
trainset = trainset.transform_first(transform_train)
testset = testset.transform_first(transform_test)
# 构造小批量数据集生成器
train_iter = gluon.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=num_workers)
test_iter = gluon.data.DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=num_workers)
# 预训练模型初始化
model_name = 'ResNet50_v2'
finetune_net = get_model(model_name, pretrained=True, root="/content/drive/My Drive/Colab/models/mxnet")
with finetune_net.name_scope():
finetune_net.output = nn.Dense(classes)
finetune_net.output.initialize(init.Xavier(), ctx = ctx)
finetune_net.collect_params().reset_ctx(ctx)
finetune_net.hybridize()
# 测试指标
def evaluate_accuracy(data_iter, net, ctx):
acc_sum, n = 0.0, 0
for X, y in data_iter:
X, y = X.as_in_context(ctx), y.as_in_context(ctx)
y = y.astype('float32')
acc_sum += (net(X).argmax(axis=1) == y).sum().asscalar()
n += y.size
return acc_sum / n
# 迭代训练
run_train = time()
trainer = gluon.Trainer(finetune_net.collect_params(), 'adam', {'learning_rate': lr, 'wd': wd})
loss = gluon.loss.SoftmaxCrossEntropyLoss()
losses, train_acc, test_acc = [],[],[]
for epoch in range(epochs):
if epoch > 0 and epoch % lr_period == 0:
trainer.set_learning_rate(trainer.learning_rate * lr_decay)
loss_sum = 0.0; accu_sum = 0.0; n = 0; start = time()
for X, y in train_iter:
X, y = X.as_in_context(ctx), y.as_in_context(ctx)
with autograd.record():
y = y.astype("float32")
output = finetune_net(X)
l = loss(output, y).sum()
l.backward()
trainer.step(batch_size)
loss_sum += l.asscalar()
accu_sum += (y==output.argmax(axis=1)).sum().asscalar()
n += y.size
losses.append(loss_sum/n)
train_acc.append(accu_sum/n)
test_acc.append(evaluate_accuracy(test_iter, finetune_net, ctx))
print(f"epoch {epoch+1:2d}", end=" ")
print("lr = %f, loss = %.3f, train acc = %.3f, test acc = %.3f, %.1f sec"
% (trainer.learning_rate, losses[epoch], train_acc[epoch], test_acc[epoch], time()-start))
print(f"total train time {(time()-run_train)/60:.1f} min")
# 可视化训练过程
idx = range(1,epochs+1)
plt.figure(figsize=(12, 4))
plt.subplot(121) # loss
plt.xlabel("epoch")
plt.ylabel("loss")
plt.plot(idx, losses, 'o', linestyle='-')
plt.xticks(range(min(idx), max(idx)+1, 1))
plt.grid()
plt.subplot(122) # accuracy
plt.xlabel("epoch")
plt.ylabel("accuracy")
plt.plot(idx, train_acc, 'o', linestyle='-', color="r", label="train accuracy")
plt.plot(idx, test_acc, 'o', linestyle='-', color="b", label="test accuracy")
plt.legend(loc="best")
plt.xticks(range(min(idx), max(idx)+1, 1))
plt.yticks(np.arange(0., 1.1, 0.1))
plt.grid()
plt.ylim([0,1.1])
plt.show()
遇到的问题:数据集相关
-
一般的数据集通过
wget url
即可以下载, 但百度这个数据集应该是禁止了这种下载方式, 用wget会报错ERROR 403: Forbidden
. 试了几个网上的解决方案, 都绕不过规则, 索性直接用Windows浏览器下载, 处理完再传到远程服务器吧#不折腾
-
由于缺少经验, 把文件夹压缩成了
rar
格式并上传到Linux上. 发现无法解压…解决方案: 用root账户下载安装rar
, 点击查看安装教程. 可是用了解压命令后, 灾难发生了: 所有图像脱离子文件夹, 全部放在了解压目录. 只好重新写脚本, 移动到子目录. 教训: 以后一定压缩为Linux通用格式, 如unrar e data.rar
*.tar
,*.gz
,*.tar.gz
等, 压缩/解压命令见博客: linux 文件压缩格式详解. 正确的rar解压命令应该是rar x fileName.rar
-
将文件上传远程服务器有两种选择: ① 在远程jupyter界面选择上传 ② 使用
rz / sz
命令实现上传/下载, 具体参见博文 XShell上传文件到Linux服务器上 -
也尝试过调用Python包实现解压, 但都不能工作, 还是需要安装依赖文件. 以后加强 Linux 的学习吧: 一步一步学Linux系列教程汇总
-
额外附上批量删除固定格式的脚本代码:
import os for root, dirs, files in os.walk("./data",topdown = True): for name in files: file = os.path.join(root, name) if ".jpg" in name: os.remove(file)
ERROR:Decoding failed. Invalid image file
-
错误描述: 关键描述在于 !res.empty() 输入图像非空检查失败!
-
解决方法: 编写脚本, 用 opencv 批量读取所有样本图像, 但凡读取错误, 即
cv. imread() is None
就是问题图像, 删除即可. -
调试细节: 出错后, 首先要定位到具体图像. 改batch为1, 不打乱读取顺序, 依次读取迭代器的值, 捕捉异常, 并打印异常图片的序号
# 读入数据集
dataset_dir = "data/cat12"
trainset = ImageFolderDataset(os.path.join(dataset_dir, 'train'))
train_iter = gluon.data.DataLoader(trainset, batch_size=1, shuffle=False)
try:
for g, data in enumerate(train_iter):
X, y = data[0], data[1]
# plt.imshow(nd.transpose(X[0], (1,2,0)).asnumpy()), plt.axis("off"),
# plt.imshow(X[0].asnumpy()), plt.axis("off"),
# plt.show()
except:
print(f"第 {g+2} 张图像出错")
- 验证脚本:
这里有一个问题, 由于Windows和Linux 按文件名排序的依据不同, 顺序也是不同的. Linux是区分大小写的, 所以如果文件名中存在大小写, 要统一为大写/小写, 使得两个系统的文件排序保持一致. 如此方便在Windows上查找出错的图像.
import cv2 as cv
import os
root = "."
for root, dirs, files in os.walk(root, topdown=True):
for filename in files:
postfix = filename.split(".")[-1]
if postfix not in ["jpg", "png", "jpeg"]:
continue
# 改为小写文件名
oldname = os.path.join(root, filename)
filename = os.path.join(root, filename.lower())
os.rename(oldname, filename)
# 验证图像格式
img = cv.imread(filename)
if img is None:
print(filename)
# os.remove(file)
print("done.")
经过定位比较, 发现右侧的问题图像是8位, 而不是24位. 尝试发现, 这样的图像格式 OpenCV 是无法读取的, 但是不报错, 只是读取结果是 None. 这个机制做图像处理的应该都了解, 其实我不明白为什么不报错呢, 有时候调试了半天, 发现只是路径错了…MXNet很好地把这个问题继承了下来…所以解决方案就是上面提到的, 读一遍, 有问题就删掉, 毕竟对于庞大的数据集而言, 少这一两张是不影响的.
训练结果
超参设置如下, 合适的超参可以更快速地训练同等准确度的模型
classes = 12
wd = 0.0001
epochs = 20
lr = 0.001
lr_period = 5
lr_decay = 0.5
ctx = mx.gpu()
batch_size = 64
num_workers = 4
CPU 2.2GHz
epoch 1 lr = 0.001000, loss = 1.392, train acc = 0.549, test acc = 0.459, 791.7 sec
epoch 2 lr = 0.001000, loss = 0.886, train acc = 0.698, test acc = 0.515, 784.5 sec
epoch 3 lr = 0.001000, loss = 0.833, train acc = 0.713, test acc = 0.496, 759.8 sec
epoch 4 lr = 0.001000, loss = 0.625, train acc = 0.779, test acc = 0.580, 743.5 sec
epoch 5 lr = 0.001000, loss = 0.512, train acc = 0.829, test acc = 0.755, 740.5 sec
epoch 6 lr = 0.001000, loss = 0.525, train acc = 0.831, test acc = 0.696, 747.1 sec
epoch 7 lr = 0.001000, loss = 0.406, train acc = 0.861, test acc = 0.670, 750.7 sec
epoch 8 lr = 0.001000, loss = 0.394, train acc = 0.868, test acc = 0.588, 747.8 sec
epoch 9 lr = 0.001000, loss = 0.402, train acc = 0.860, test acc = 0.718, 742.5 sec
epoch 10 lr = 0.001000, loss = 0.277, train acc = 0.903, test acc = 0.744, 740.5 sec
epoch 11 lr = 0.000100, loss = 0.165, train acc = 0.948, test acc = 0.857, 735.5 sec
epoch 12 lr = 0.000100, loss = 0.068, train acc = 0.985, test acc = 0.862, 735.1 sec
epoch 13 lr = 0.000100, loss = 0.064, train acc = 0.983, test acc = 0.859, 731.5 sec
epoch 14 lr = 0.000100, loss = 0.063, train acc = 0.986, test acc = 0.867, 735.3 sec
epoch 15 lr = 0.000100, loss = 0.040, train acc = 0.991, test acc = 0.867, 738.4 sec
epoch 16 lr = 0.000100, loss = 0.026, train acc = 0.999, test acc = 0.864, 734.2 sec
epoch 17 lr = 0.000100, loss = 0.028, train acc = 0.996, test acc = 0.871, 744.2 sec
epoch 18 lr = 0.000100, loss = 0.033, train acc = 0.991, test acc = 0.867, 775.9 sec
epoch 19 lr = 0.000100, loss = 0.029, train acc = 0.996, test acc = 0.867, 824.8 sec
epoch 20 lr = 0.000100, loss = 0.024, train acc = 0.995, test acc = 0.859, 898.8 sec
Colab Tesla P100
epoch 1 lr = 0.001000, loss = 1.204, train acc = 0.608, test acc = 0.558, 16.1 sec
epoch 2 lr = 0.001000, loss = 0.660, train acc = 0.775, test acc = 0.399, 15.6 sec
epoch 3 lr = 0.001000, loss = 0.540, train acc = 0.813, test acc = 0.730, 15.4 sec
epoch 4 lr = 0.001000, loss = 0.379, train acc = 0.872, test acc = 0.793, 15.5 sec
epoch 5 lr = 0.001000, loss = 0.301, train acc = 0.896, test acc = 0.682, 15.5 sec
epoch 6 lr = 0.000500, loss = 0.174, train acc = 0.948, test acc = 0.842, 15.4 sec
epoch 7 lr = 0.000500, loss = 0.079, train acc = 0.979, test acc = 0.835, 15.5 sec
epoch 8 lr = 0.000500, loss = 0.045, train acc = 0.987, test acc = 0.842, 15.5 sec
epoch 9 lr = 0.000500, loss = 0.031, train acc = 0.991, test acc = 0.869, 15.4 sec
epoch 10 lr = 0.000500, loss = 0.021, train acc = 0.996, test acc = 0.866, 15.4 sec
epoch 11 lr = 0.000250, loss = 0.023, train acc = 0.995, test acc = 0.879, 15.3 sec
epoch 12 lr = 0.000250, loss = 0.014, train acc = 0.995, test acc = 0.892, 15.6 sec
epoch 13 lr = 0.000250, loss = 0.008, train acc = 0.999, test acc = 0.890, 15.5 sec
epoch 14 lr = 0.000250, loss = 0.011, train acc = 0.997, test acc = 0.893, 15.5 sec
epoch 15 lr = 0.000250, loss = 0.007, train acc = 0.999, test acc = 0.903, 15.6 sec
epoch 16 lr = 0.000125, loss = 0.005, train acc = 1.000, test acc = 0.900, 15.4 sec
epoch 17 lr = 0.000125, loss = 0.007, train acc = 0.999, test acc = 0.898, 15.4 sec
epoch 18 lr = 0.000125, loss = 0.003, train acc = 0.999, test acc = 0.876, 15.5 sec
epoch 19 lr = 0.000125, loss = 0.002, train acc = 1.000, test acc = 0.892, 15.6 sec
epoch 20 lr = 0.000125, loss = 0.003, train acc = 1.000, test acc = 0.893, 15.6 sec
Colab Tesla K80
epoch 1 lr = 0.001000, loss = 1.130, train acc = 0.626, test acc = 0.289, 110.0 sec
epoch 2 lr = 0.001000, loss = 0.742, train acc = 0.754, test acc = 0.611, 48.4 sec
epoch 3 lr = 0.001000, loss = 0.497, train acc = 0.835, test acc = 0.672, 48.6 sec
epoch 4 lr = 0.001000, loss = 0.337, train acc = 0.886, test acc = 0.750, 48.7 sec
epoch 5 lr = 0.001000, loss = 0.331, train acc = 0.891, test acc = 0.730, 48.5 sec
epoch 6 lr = 0.000500, loss = 0.187, train acc = 0.941, test acc = 0.870, 48.9 sec
epoch 7 lr = 0.000500, loss = 0.089, train acc = 0.968, test acc = 0.836, 48.7 sec
epoch 8 lr = 0.000500, loss = 0.070, train acc = 0.976, test acc = 0.853, 48.7 sec
epoch 9 lr = 0.000500, loss = 0.048, train acc = 0.987, test acc = 0.850, 49.0 sec
epoch 10 lr = 0.000500, loss = 0.039, train acc = 0.989, test acc = 0.847, 48.9 sec
epoch 11 lr = 0.000250, loss = 0.027, train acc = 0.992, test acc = 0.887, 48.2 sec
epoch 12 lr = 0.000250, loss = 0.012, train acc = 0.998, test acc = 0.901, 48.4 sec
epoch 13 lr = 0.000250, loss = 0.011, train acc = 0.997, test acc = 0.886, 48.5 sec
epoch 14 lr = 0.000250, loss = 0.006, train acc = 1.000, test acc = 0.898, 48.4 sec
epoch 15 lr = 0.000250, loss = 0.005, train acc = 0.999, test acc = 0.898, 48.4 sec
epoch 16 lr = 0.000125, loss = 0.006, train acc = 1.000, test acc = 0.898, 48.6 sec
epoch 17 lr = 0.000125, loss = 0.004, train acc = 0.999, test acc = 0.898, 48.2 sec
epoch 18 lr = 0.000125, loss = 0.002, train acc = 1.000, test acc = 0.904, 48.5 sec
epoch 19 lr = 0.000125, loss = 0.002, train acc = 1.000, test acc = 0.901, 48.4 sec
epoch 20 lr = 0.000125, loss = 0.003, train acc = 1.000, test acc = 0.898, 48.6 sec
total train time 17.2 min