2018.4.18Python机器学习记录

一.Ubuntu14.04安装numpy

1.参考网址

2.安装代码:

在安装之前建议更新一下软件源 :

sudo apt-get update

如果python 2.7 没有问题 ,就可以进行下一步了
现在安装用于数值计算和绘图的包以及Sklearn 分别是numpy scipy matplotlib pandas 和 sklearn
apt-get命令如下

sudo apt-get install python-numpy
sudo apt-get install python-scipy
sudo apt-get install python-matplotlib
sudo apt-get install python-pandas
sudo apt-get install python-sklearn

3.测试

测试下是否全部安装成功,打开python解释器,输入以下命令,若无报错,则就成功。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets,linear_model

4.Ubuntu编写与运行python程序

(1)vim创建hello.py中编写代码:print ‘Hello,Welcome to linux python’
(2)进入程序所在目录:cd python
(3)运行程序 python hello.py

二.NumPy函数库基础

Python shell 开发环境下输入以下命令:

>>>from numpy import *  #将numpy函数库中的所有模块引入当前的命名空间
>>> random.rand(4,4)   #构造一个4*4的随机数组
array([[ 0.97134166, 0.69816709, 0.35251331, 0.32252662], [ 0.40798608, 0.48113781, 0.67629943, 0.12288183], [ 0.96055063, 0.85824686, 0.95458472, 0.40213735], [ 0.28604852, 0.43380204, 0.2558164 , 0.07954809]])

>>> randMat=mat(random.rand(4,4)) #调用mat函数将数组转换为矩阵
>>> randMat.I #.I实现矩阵求逆运算
matrix([[ 1.12580852, -0.43470821, 2.71229992, -2.16829781], [-1.4600302 , 1.65644197, -1.3742097 , 1.6297217 ], [ 3.379582 , 0.40573689, 0.84634018, -2.72232677], [-3.35086377, -2.64978047, -1.39459215, 4.68277082]])

>>> invaRandMat=randMat.I #存储逆矩阵
>>> randMat*invaRandMat #矩阵乘法 生成单位矩阵
matrix([[ 1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 2.22044605e-16], [ -2.22044605e-16, 1.00000000e+00, 1.24900090e-16, 2.49800181e-16], [ -2.22044605e-16, -1.11022302e-16, 1.00000000e+00, 2.22044605e-16], [ -4.44089210e-16, -2.22044605e-16, -2.22044605e-16, 1.00000000e+00]])

>>> myEye=randMat*invaRandMat  #

>>> myEye-eye(4)  #求误差值,eye(4)生成4*4单位矩阵
matrix([[ -4.44089210e-16, 0.00000000e+00, 0.00000000e+00, 2.22044605e-16], [ -2.22044605e-16, -1.11022302e-16, 1.24900090e-16, 2.49800181e-16], [ -2.22044605e-16, -1.11022302e-16, 0.00000000e+00, 2.22044605e-16], [ -4.44089210e-16, -2.22044605e-16, -2.22044605e-16, 4.44089210e-16]])

三.k-近邻算法实战

1.准备:使用Python导入数据

vim创建kNN.py:

from numpy import *  #导入运算模块
import operator   

def createDataSet():  
    group=array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
    labels=['A','A','B','B']
    return group,labels

2.进入Python开发环境测试

>>> import kNN
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named kNN
解决:需进入到kNN.py存储路径,然后在终端输入Python
(1)cy@pc:~$ cd python #我的保存路径
cy@pc:~/python$ python
>>> import kNN  
>>> group,labels=kNN.createDataSet()
>>> group
array([[ 1. ,  1.1],
       [ 1. ,  1. ],
       [ 0. ,  0. ],
       [ 0. ,  0.1]])
>>> labels
['A', 'A', 'B', 'B']

3.实施kNN分类算法

在上面基础上加入函数classsify0()

from numpy import *
import operator   

def createDataSet():
    group=array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
    labels=['A','A','B','B']
    return group,labels
    #inx分类的输入向量 dataSet输入的训练样本集,labels标签向量,k最近邻数目
def classify0(inx,dataSet,labels,k): 
    #numpy函数shape[0]返回dataSet的行数
    dataSetSize =dataSet.shape[0]
    #在列向量方向上重复inX共1次(横向),行向量方向上重复inX共dataSetSize次(纵向)
    diffMat=tile(inx,(dataSetSize,1))-dataSet  #错写成了dataset
    sqDiffMat=diffMat**2
    sqDistances =sqDiffMat.sum(axis=1)
    distances=sqDistances**0.5****
    sortedDistIndicies=distances.argsort()
    classCount={}
    for i in range(k):
        voteIlabel=labels[sortedDistIndicies[i]]
        classCount[voteIlabel]=classCount.get(voteIlabel,0)+1
    sortedClassCount=sorted(classCount.iteritems(),
    key=operator.itemgetter(1),reverse=True)   #错写成了true
    return sortedClassCount[0][0]

代码补充解释

numpy.tile()
比如 a = np.array([0,1,2]), np.tile(a,(2,1))就是把a先沿x轴(就这样称呼吧)复制1倍,即没有复制,仍然是 [0,1,2]。再把结果沿y方向复制2倍,即最终得到
 array([[0,1,2], [0,1,2]])

同理:

>>> b = np.array([[1, 2], [3, 4]])
>>> np.tile(b, 2) #沿X轴复制2倍
array([[1, 2, 1, 2], [3, 4, 3, 4]])
>>> np.tile(b, (2, 1))#沿X轴复制1倍(相当于没有复制),再沿Y轴复制2倍
array([[1, 2], [3, 4], [1, 2], [3, 4]])

测试:

>>> import kNN
>>> group,labels=kNN.createDataSet()
>>> kNN.classify0([0,0],group,labels,3)
'B'