make_anchor_list.py分析

Kmeans介绍

Kmeans聚类属于无监督学习算法，目的是将一组数据分成k组，称为k个簇，计算出这k组中每组的中心。

Kmeans算法思想

从数据集中随机选取k个点，作为初始化的簇中心。
计算每个点到簇中心的距离，并将该点分配到最近的簇中（与那一簇中心的距离最近）。
对于2步中重新分配好的簇，重新计算这个簇的中心（大概就是求横纵坐标的均值作为新的中心）。
重复2、3步，直到新计算的簇中心不再变化为止。

Kmeans应用于anchors box计算

anchors box用于预测bounding box，当anchor box更接近真实的宽高时，模型的性能越好。Kmeans应用在anchors box的计算就是为了计算出更接近真实宽高的k对值。与上边的Kmeans不同的是不能用欧式几何距离进行分类，而是采用IOU交并比来作为衡量每对值应该划分进哪一簇。IOU可以很好地表示出两对宽高的接近情况，IOU取值为[0, 1]之间，IOU越大就表示这两对宽高比越接近。在Kmeans里，距离就用1-IOU表示。

anchors:
	python ./make_anchor_list.py \
			${DATASET} \
			--max_iters 10 \
			--is_random True \
			--in_hw ${IMGSIZE} \
			--out_hw ${OUTSIZE} \
			--anchor_num ${ANCNUM} \
			--low ${LOW} \
			--high ${HIGH}

make anchors DATASET=voc ANCNUM=2 LOW="0.0 0.0" HIGH="1.0 1.0"

def main(train_set: str, max_iters: int, in_hw: tuple, out_hw: tuple,
         anchor_num: int, is_random: bool, is_plot: bool, low: list, high: list):
    X = np.load(f'data/{train_set}_img_ann.npy', allow_pickle=True)
    in_wh = np.array(in_hw[::-1])
    low = np.array(low)
    high = np.array(high)
    # NOTE correct boxes
    for i in range(len(X)):
        # X[i, 1], X[i, 2]
        img_wh = X[i, 2][::-1]

        """ calculate the affine transform factor """
        scale = in_wh / img_wh  # NOTE affine tranform sacle is [w,h]
        scale[:] = np.min(scale)
        # NOTE translation is [w offset,h offset]
        translation = ((in_wh - img_wh * scale) / 2).astype(int)

        """ calculate the box transform matrix """
        X[i, 1][:, 1:3] = (X[i, 1][:, 1:3] * img_wh * scale + translation) / in_wh
        X[i, 1][:, 3:5] = (X[i, 1][:, 3:5] * img_wh * scale) / in_wh

    x = np.vstack(X[:, 1])
    x = x[:, 3:]
    layers = len(out_hw) // 2
    if is_random == 'True':
        initial_centroids = np.hstack((np.random.uniform(low[0], high[0], (layers * anchor_num, 1)),
                                       np.random.uniform(low[1], high[1], (layers * anchor_num, 1))))
    else:
        initial_centroids = np.vstack((np.linspace(0.05, 0.3, num=layers * anchor_num), np.linspace(0.05, 0.5, num=layers * anchor_num)))
        initial_centroids = initial_centroids.T
    centroids, idx = runkMeans(x, initial_centroids, 10, is_plot)
    # NOTE : sort by descending , bigger value for layer 0 .
    centroids = np.array(sorted(centroids, key=lambda x: (-x[0])))
    centroids = np.reshape(centroids, (layers, anchor_num, 2))
    for l in range(layers):
        centroids[l] = centroids[l]  # grid_wh[l]  # NOTE centroids是相对于全局的0-1
    if np.any(np.isnan(centroids)):
        print(ERROR, 'Result have NaN value please Rerun!')
    else:
        print(NOTE, f'Now anchors are :\n{centroids}')
        np.save(f'data/{train_set}_anchor.npy', centroids)

def parse_arguments(argv):
    parser = argparse.ArgumentParser()

    parser.add_argument('train_set', type=str, help=NOTE + 'this is train dataset name , the output *.npy file will be {train_set}_anchors.list')
    parser.add_argument('--max_iters', type=int, help='kmeans max iters', default=10)
    parser.add_argument('--is_random', type=str, help='wether random generate the center', choices=['True', 'False'], default='True')
    parser.add_argument('--is_plot', type=str, help='wether show the figure', choices=['True', 'False'], default='True')
    parser.add_argument('--in_hw', type=int, help='net work input image size', default=(224, 320), nargs='+')
    parser.add_argument('--out_hw', type=int, help='net work output image size', default=(7, 10, 14, 20), nargs='+')
    parser.add_argument('--low', type=float, help='Lower bound of random anchor, (x,y)', default=(0.0, 0.0), nargs='+')
    parser.add_argument('--high', type=float, help='Upper bound of random anchor, (x,y)', default=(1.0, 1.0), nargs='+')
    parser.add_argument('--anchor_num', type=int, help='single layer anchor nums', default=3)

    return parser.parse_args(argv)