引言

遗传算法是最优化问题的常用方法，尤其是组合优化，常常可以得到近似最优解。

从物种的进化历史来看，物种的进化正是自我优化的过程。

从种群上来看，进化这一过程中，目标函数正是物种对环境的适应能力，限制条件就是严酷的自然环境，而状态表征量正是物种的遗传物质，基因，DNA，RNA等。

假设环境不变的话，每一代物种都比上一代物种更加适应这个环境，因为不能适应环境的基因已经被淘汰。这一代代的淘汰与繁衍，正是一代代的优化过程。很可惜，自然环境是变化的，是与物种相互作用的，所以恐龙灭绝了。:（

但是，作为数学假设，遗传算法的限制条件一般是不变的，当然这也可以使变化的。

遗传算法的求解步骤

1.设定求解框架

种群数量是有限的，所以要设置种群的数量 population
进化过程用种群繁殖的代数表示，同样，种群的代数也是有限值 generation number
作为基因，在繁殖的时候要交换染色体，一般设定交换概率P = 1，可以保证充分进化，较快得到优化的结果。

繁殖过程中存在变异。将变异的概率设定为P = 0.1，符合实际中变异发生率低的事实。

2.状态变量

遗传算法一般用来求解组合问题的优化结果。
用随机序列 w1,w2,...wn 表示基因排布。其中， 0<=wi<=1 。
比如， n = 9时，随机序列为 [0.23, 0.82, 0.45, 0.74,0.87,0.11, 0.56, 0.69, 0.78] ，解码方法一般是排序操作，对随机序列进行升序排列，得到序列由小到大的元素的索引值 [5, 0, 2, 6, 7, 3, 8, 1, 4]。

3. 确定目标函数

map(target_function, state_variable)

4.初始化种群

一般采用改良圈法得到。对随机序列

w1,w2,...,wu−1,wu,wu+1,...,wv−1,wv,wv+1,...wn

交换 u 与 v 之间的顺序， (0<u<v<n) 得到：

w1,w2,...,wu−1,wv,wv−1,...,wu+1,wu,wv+1,...wn

如果新的序列使目标函数减小，那么就覆盖原序列。直到，找到一个局部最优解。

5. 基因交叉

对种群随机两两匹配。交叉的方式有很多，这里只提及一种。
假设其中一组的两个序列分别为：
f1=[w1,w2,...wn]
f2=[w′1,w′2,...w′n]
随机选取 t(0<t<=n) 作为交换节点。
那么子代 child1 的基因的前 t 个与 f1 的前 t 个相同，后 n−t 个与 f2 的后 n−t 个相同。
同样，子代基因 child2 的基因的前 t 个与 f2 的前 t 个相同，后 n−t 个与 f1 的后 n−t 个相同。

6. 基因变异

基因变异的方式有很多，记得生物学上DNA于RNA的变异不同。DNA的变异相对RNA的变异而言，与本体相比，差异小。而RNA变异则是大段基因的重组，所以RNA作为遗传物质的病毒，如HIV病毒，变异很快，适应环境的速度快。
这里为了快速得到最优解，可以选取类似RNA变异的算法。当然，还有许多其他方法。

已知，基因 [w1,w2,...wn] 随机选取三个节点
u,v,w.(0<u<v<w<n)
将 u,v 之间的基因段插入到 w 的后面

7.进化

在得到的父代以及子代和变异的的种群后，按状态序列对应的目标函数值进行排序，最后选取最大的或是最小的固定种群数个基因，进行迭代，直到达到最大迭代次数，也就是generation number。

举例

一架飞机的起飞地点的经纬度分别是(70,40)。飞机速度为1000km/h。飞机从基地出发侦查完所有目标后返回基地。侦查时间不计，最短要花费多长时间回到基地。
目标的经纬度在文件”sj.txt”中。
这个问题需要求解起点和终点之间的序列。
应用上面的步骤可以利用numpy实现近似最优化解。
注意：遗传算法每次运行得到的结果可能并不相同，应多次运行后，再根据局部最优解得到近似解。本例中，并未多次运行，实现算法只为了学习、理解。

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import numpy as np
import math

def load_txt(filename):
    """ return the [[x1,y1],[x2,y2]....] """
    data = []
    with open(filename) as f:
        for line in f:
            print(line)
            line = line.strip()

            first, second = line.split(' ')
            data.append([float(first),float(second)])
    return data

def init_pop(population_num, distance_mat):
    """ return array shape (population_num, 102) """
    population = np.zeros((population_num, 102))
    for row in range(population_num):
        init_one = list(range(1,101))
        np.random.shuffle(init_one)
        init_one = [0] + init_one + [101]
        for t in range(102):
            quit_flag = 0
            for m in range(100):
                for n in range(m+2, 101):
                    if distance_mat[init_one[m], init_one[n]] + distance_mat[init_one[m+1],init_one[n+1]] \
                        < distance_mat[init_one[m], init_one[m+1]] + distance_mat[init_one[n],init_one[n+1]]:
                        temp = init_one[:]
                        init_one = temp[:m+1] + temp[m+1:n+1][::-1] + temp[n+1:]
                        quit_flag = 1
            if quit_flag == 0:
                for idx, val in enumerate(init_one):
                    population[row,val] = idx
                break
    population /= 101.0 # encoding 
    return population


def product(parent, distance_mat):
    """ parent numpy array (population_num, 102) distance_mat numpy array (102,102) return new population """
    child = np.copy(parent)
    row = parent.shape[0]
    c = list(range(row))
    np.random.shuffle(c)

    # exchange gene
    for i in range(0,row,2):
        position = math.floor(np.random.random() * 102)
        temp = np.copy(child[c[i], position:]) # ###### copy 
        child[c[i], position:] = child[c[i+1], position:]
        child[c[i+1], position:] = temp

    # mutation gene
    change_row = np.array(range(row))[np.random.random(row)<0.1]
    B = np.copy(child[change_row,:])
    for i in range(len(change_row)):
        mutation_pos = np.sort(np.floor(np.random.random(3)*102))
        temp = np.copy(B[i,:])
        B[i,:] = np.hstack((temp[:mutation_pos[0]], temp[mutation_pos[1]+1:mutation_pos[2]+1],\
                           temp[mutation_pos[0]:mutation_pos[1]+1],temp[mutation_pos[2]+1:]))

    family = np.vstack((parent, child, B))
    family_idx = np.argsort(family, axis = 1) # decoding
    row_num = family.shape[0]
    length = np.zeros(row_num)

    for j in range(row_num):
        for i in range(101):
            length[j] += distance_mat[family_idx[j,i],family_idx[j,i+1]]

    idx = np.argsort(length)
    population = family[idx[:row],:]
    return population




if __name__ == '__main__':

    sj = load_txt('sj.txt') #length is 100
    start_point = [[70,40]]
    sj = start_point + sj + start_point # the path is circular
    sj = np.array(sj) # change to array
    sj = sj * np.pi/180 # change to rads
    distance_mat = np.zeros((102,102)) # initialize the distance matrix

    for i in range(101):
        for j in range(i+1,102):
            point1 = np.array([np.cos(sj[i,0])*np.cos(sj[i,1]), np.sin(sj[i,0])*np.cos(sj[i,1]), np.sin(sj[i,1])])
            point2 = np.array([np.cos(sj[j,0])*np.cos(sj[j,1]), np.sin(sj[j,0])*np.cos(sj[j,1]), np.sin(sj[j,1])])
            distance_mat[i,j] = 6370 * math.acos(np.dot(point1, point2)\
                /np.sqrt(np.dot(point1,point1) * np.dot(point2,point2)))

    distance_mat = distance_mat + distance_mat.T # build the systematic matrix

# set the algorithm parameter
    population_num = 50
    generation_num = 100

    population = init_pop(population_num, distance_mat) # find the sectional optimize value

# iterations
    for _ in range(generation_num):
        population = product(population, distance_mat)

# population is the final group of genes
# find the best gene
    num = population.shape[0]
    population_idx = np.argsort(population, axis=1)
    length = np.zeros(num)

    for j in range(num):
        for i in range(101):
            length[j] += distance_mat[population_idx[j,i],population_idx[j,i+1]]

    idx = np.argsort(length)[0]
    path = population_idx[idx]
    print(length[idx])

[笔记]遗传算法 genetic algorithm

引言