第三阶段也是实战阶段,不同于前两个阶段的填空而是实打实的预测分析
题目会给出8000张照片数据,其中6000作为训练集而另外2000张作位测试集,通过对6000张的训练来预测2000的结果,并将结果输出到csv文件中,提交检验成功
我们之前学了一阵子的TensorFlow,对神经网络的搭建有的大体的认识,而且在网上也轻松找到对应的模板,我们打算根据本题修改模板使其为之所用
在与同学的一起努力之下,初步代码已经完成,我们又进行修改和完善,最后成型(见如下代码)

import os
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
import cv2
import keras
from keras import datasets, layers, models

os.environ['KERAS_BACKEND'] = 'tensorflow'

js_path = '/home/kesci/input/weather_image1552/train.json'
test_path = '/home/kesci/input/weather_image1552/测试集/'
train_path = '/home/kesci/input/weather_image1552/训练集/'
import json
testdata=400#测试集数量
path = '/home/kesci/input/weather_image1552/train.json'
with open(path, 'r') as f:
    label = json.load(f)


def read_image(paths):
    os.listdir(paths)
    filelist = []
    for root, dirs, files in os.walk(paths):
        for file in files:
            if os.path.splitext(file)[1] == ".jpg":
                filelist.append(os.path.join(root, file))
    return filelist


def im_resize(paths):
    for filename in paths:
        with Image.open(filename) as im:
            newim = im.resize((128, 128))
            newim.save(filename)


def im_array(paths):
    M = []
    for filename in paths:
        im = Image.open(filename)
        im_L = im.convert("L")
        im_L = im_L.resize((128, 128))
        Core = im_L.getdata()
        arr1 = np.array(Core, dtype='float32') / 255.0
        list_img = arr1.tolist()
        M.extend(list_img)
    return M




# mp={'cloudy':0,'sunny':1}
dict_label={
   0:'1',1:'0'}
mp = {
   'sunny': 0, 'cloudy': 1}
# label=[0]*len(filelist_1)+[1]*len(filelist_2)
js_pic = []
js_lab = []
cnt = 0
for key in label:
    if (cnt < testdata):
        js_pic.append(key)
        js_lab.append(mp[label[key]])
    cnt += 1
train_lables = np.array(js_lab)

tot = []
cnt=0
for key in label:
    if (cnt < 2000):
        tot.append(key)
    cnt += 1

features = []
filelist = []
for i in range(len(js_pic)):
    img = cv2.imread(train_path + js_pic[i], 0)
    #print(train_path + js_pic[i])
    filelist.append(train_path + js_pic[i])
trainfilelist = filelist
M = []
M = im_array(trainfilelist)
train_images=np.array(M).reshape(len(trainfilelist),128,128)

train_images = train_images[ ..., np.newaxis ]
#print(train_images)
# X = np.array(list(zip(x1,x2))).reshape(len(x1), 2)
# train_images=np.array(M)
# train_images = train_images[ ..., np.newaxis ]


# 神经网络
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(2, activation='softmax'))
model.summary()
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(train_images, train_lables, epochs=10)
# ,batch_size=400
# print(model.evaluate(train_images,train_lables))

a=[]
# test = r'C:\Users\carvi\Desktop\人工智能\天气识别\test'
filelist = read_image(test_path)
im_resize(filelist)
for i in range(2000):
    im = Image.open(test_path + tot[i])
    #print(test_path + tot[i])
    im_L = im.convert("L")
    Core = im_L.getdata()
    arr1 = np.array(Core, dtype='float32') / 255.0
    list_img = arr1.tolist()
    images = np.array(list_img).reshape(-1, 128, 128, 1)
    predictions_single = model.predict(images)
    #print("预测结果为:", dict_label[np.argmax(predictions_single)])
    #print("预测结果为:", np.argmax(predictions_single))
    a.append(np.argmax(predictions_single))
    #print(predictions_single)

np.savetxt('/home/kesci/input/new.csv',a,delimiter = ',')  
print(a)
"""
for filename in filelist:
    im = Image.open(filename)
    #print(filename)
    im_L = im.convert("L")
    Core = im_L.getdata()
    arr1 = np.array(Core, dtype='float32') / 255.0
    list_img = arr1.tolist()
    images = np.array(list_img).reshape(-1, 128, 128, 1)
    predictions_single = model.predict(images)
    print("预测结果为:", np.argmax(predictions_single))
    print(predictions_single)
"""



但是对我们来说还有个巨大的麻烦,就是如何输出数据结果到csv文件,这可属实困扰到我,我查阅大量资料,但是最终效果总是不能让我满意,最后我想到一个方法:将答案结果输入到一个数组里,然后放在txt文件中,然后我再写另一个程序读取txt文件,然后输出到csv文件中,通过中折的方法达到我的目的。
输出程序如图

import os
import numpy as np
import  pandas as pd
# !/usr/bin/python
# coding = UFT-8
data = pd.read_table('C:\\Users\DELL\Desktop\活动\人工智能\图像\ceshi.txt',sep='\n')
#header=None:没有每列的column name,可以自己设定
#encoding='gb2312':其他编码中文显示错误
#sep=',': ','隔开
data1 = pd.DataFrame(data)
data1.to_csv('C:\\Users\DELL\Desktop\活动\人工智能\图像\data1.csv',sep='\n',index=False)
#data1 = pd.DataFrame(arr1, header = False, index = False) # header:原第一行的索引,index:原第一列的索引
#data1.to_csv('C:\\Users\DELL\Desktop\活动\人工智能\图像\data1.csv\data1.csv',sep='\n')

但是提交最终结果后发现得分只有0.5,实属懵逼了。我辛辛苦苦做了一阵子争取率只有一半,和刚开始蒙的一样,(我一开始把结果全部预测为1,就是纯瞎蒙的答案提交上去,看看能得到多少分,没想到是0.5)
还有个问题就是,按理说训练集越大正确率越高,但是实际是我6000个数据的训练正确率只有百分之50多,但是400个训练集却有百分之八十多,有时甚至到百分之九十几,搞不清为什么

最终提交的csv文件如图


继续搞吧,唉,路还长着呢~