1.波士顿房价线性回归模型
from sklearn import datasets
from sklearn.linear_model import LinearRegression
boston = datasets.load_boston()
data_X = boston.data
y = boston.target
model = LinearRegression()
model.fit(data_X,y)
pred = model.predict(data_X[:2,:])
actu = y[:2]
print(pred)
print(actu)
output:
前两个房屋预测价格:[ 30.00821269 25.0298606 ]
前两个房屋实际价格:[ 24. 21.6]
可见误差还是挺大的,还需要很多工作来对模型进行优化
2.对鸢尾花降维
from sklearn import datasets
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
iris = datasets.load_iris()
X = iris.data
y = iris.target
pca = PCA(n_components = 2)
reduced_X = pca.fit_transform(X)
plt.scatter(reduced_X[:,:1],reduced_X[:,1:2])
鸢尾花原本是三种类别,此处把原本4个特征维度降低为2个特征维度,但是大致还是三类
3.KNN对鸢尾花分类
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.neighbors import KNeighborsClassifier
iris_X,iris_y = datasets.load_iris(return_X_y=True)
knn = KNeighborsClassifier()
X_train,X_test,y_train,y_test = train_test_split(iris_X,iris_y,test_size=0.3)
knn.fit(X_train,y_train)
print(knn.predict(X_test))
print(y_test)
output:
KNN测试集预测分类:[1 2 1 0 0 2 0 1 2 0 2 0 2 0 2 0 0 2 1 0 1 2 1 0 1 2 0 1 0 0 0 1 1 0 0 0 1 2 2 0 2 1 1 0 1]
KNN测试集实际分类:[1 2 1 0 0 2 0 1 2 0 1 0 2 0 2 0 0 2 1 0 1 2 1 0 1 2 0 1 0 0 0 1 1 0 0 0 1 2 2 0 2 1 1 0 2]
knn默认选择近邻5个数据,使用交叉验证,test数据为总数据的30%
4.标准化数据
import numpy as np
from sklearn import preprocessing
a = np.array([[10,2.7,3.6],[-100,5,-2],[120,20,40]])
print(a)
print(preprocessing.scale(a))
print(preprocessing.minmax_scale(a))
print(preprocessing.minmax_scale(a,feature_range=(-1,1)))
第三个输出默认标准化范围为0到1,第四个输出自定义标准化范围在-1到1之间
5.支持向量机对鸢尾花分类
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.svm import SVC
iris_X,iris_y = datasets.load_iris(return_X_y=True)
X_train,X_test,y_train,y_test = train_test_split(iris_X,iris_y,test_size=0.3)
clf = SVC()
clf.fit(X_train,y_train)
print(clf.predict(X_test))
print(y_test)
print(clf.score(X_test,y_test))
每次得到的结果都不相同,可能跟交叉验证取值有关吧,其中的训练数据70%应该是随机抽取的
6.交叉验证KNN中近邻数对精度的影响
from sklearn import datasets
from sklearn.cross_validation import cross_val_score
from sklearn.neighbors import KNeighborsClassifier
import matplotlib.pyplot as plt
iris_X,iris_y = datasets.load_iris(return_X_y=True)
krange = range(1,31)
kscores=[]
for k in krange:
knn = KNeighborsClassifier(n_neighbors = k)
scores = cross_val_score(knn,iris_X,iris_y,cv=10,scoring='accuracy')
kscores.append(scores.mean())
plt.plot(krange,kscores)
plt.xlabel('value of k for knn')
plt.ylabel('cross_validated accuracy')
plt.show()