熵值法计算步骤
- 选取数据
选取 m m m个指标,共 n n n个样本,则 X i j X_{ij} Xij为第 i i i个样本的第 j j j个指标的数值, i = 1 , 2 , 3 , . . . n ; j = 1 , 2 , 3... m . i=1,2,3,...n;j=1,2,3...m. i=1,2,3,...n;j=1,2,3...m. - 数据标准化处理
各项指标的计量单位以及方向不统一的情况下,需要对对数据进行标准化,为了避免求熵值时对数无意义,可以为每个 0 0 0值加上较小数量级的实数,如 0.01 0.01 0.01.
(1)对于正向指标(越大越好的指标)
X ′ = X i j − M i n ( X i j ) M a x ( X i j ) − M i n ( X i j ) X^{'}=\frac{X_{ij}- Min(X_{ij})}{Max(X_{ij})-Min(X_{ij})} X′=Max(Xij)−Min(Xij)Xij−Min(Xij)
(2)对于负向指标(越小越好的指标)
X ′ = M a x ( X i j ) − X i j M a x ( X i j ) − M i n ( X i j ) X^{'}=\frac{Max(X_{ij})-X_{ij}}{Max(X_{ij})-Min(X_{ij})} X′=Max(Xij)−Min(Xij)Max(Xij)−Xij - 计算第 j j j项指标下第 i i i个样本占该指标的比重
计算样本权重:
P i j = X i j ∑ i = 1 n X i j P_{ij}=\frac{X_{ij}}{\sum_{i=1}^{n}X_{ij}} Pij=∑i=1nXijXij - 计算第 j j j项指标的熵值
计算指标熵值:
e j = − K ∗ ∑ i = 1 n ( P i j ∗ l n ( P i j ) ) e_j=-K*\sum_{i=1}^{n}(P_{ij}*ln(P_{ij})) ej=−K∗i=1∑n(Pij∗ln(Pij))
K = 1 l n ( n ) K=\frac{1}{ln(n)} K=ln(n)1,其中n为样本个数。 - 计算第 j j j项指标的差异系数
某项指标的信息效用值取决于该指标的信息熵 e j e_j ej与 1 1 1之间的差值,它的值直接影响权重的大小。信息效用值越大,对评价的重要性就越大,权重也就越大。
d j = 1 − e j d_j=1-e_j dj=1−ej - 计算评价指标权重
利用熵值法估算各指标的权重,其本质是利用该指标信息的差异系数来计算,其差异系数越高,对评价的重要性就越大(或称权重越大,对评价结果的贡献就越大)
第 j j j项指标的权重:
w j = d j ∑ j = 1 m d j w_j=\frac{d_j}{\sum_{j=1}^md_j} wj=∑j=1mdjdj - 计算各样本综合得分
z i = ∑ j = 1 m w j x i j z_i=\sum_{j=1}^mw_jx_{ij} zi=j=1∑mwjxij
实例
import pandas as pd
import numpy as np
# 读取数据
data = pd.read_excel("./temp.xlsx", index_col=[0])
data.head()
# 正向指标标准化处理
data[["cured_rate", "StringencyIndex", "GovernmentResponseIndex", "ContainmentHealthIndex",
"EconomicSupportIndex"]] = (data[["cured_rate", "StringencyIndex", "GovernmentResponseIndex", "ContainmentHealthIndex",
"EconomicSupportIndex"]]-data[["cured_rate", "StringencyIndex", "GovernmentResponseIndex", "ContainmentHealthIndex",
"EconomicSupportIndex"]].min())/(data[["cured_rate", "StringencyIndex", "GovernmentResponseIndex", "ContainmentHealthIndex",
"EconomicSupportIndex"]].max()-data[["cured_rate", "StringencyIndex", "GovernmentResponseIndex", "ContainmentHealthIndex",
"EconomicSupportIndex"]].min())
# 反向指标标准化处理
data[["confirmed","confirmed_rate","dead_rate"]] = (data[["confirmed","confirmed_rate","dead_rate"]].max() - data[["confirmed","confirmed_rate","dead_rate"]])/(data[["confirmed","confirmed_rate","dead_rate"]].max()-data[["confirmed","confirmed_rate","dead_rate"]].min())
# 计算样本权重,data为标准化后的数据
p = data/data.sum()
# 计算指标熵值
K = 1/np.log(len(p))
e = -K*np.sum(p*np.log(p))
# 计算差异系数
d = 1-e
# 计算指标权重
w = d/d.sum()
# 计算综合得分
score = (w*data).sum(axis=1)
鄙人学识浅薄,如有错误之处,烦请各位道友指出。