Virus synthesis

回文自动机好题!

题意:初始有一个空串,然后通过两种操作得到目的串t,求最少操作数。两种操作分别为:

  1. 在字符串前面或者后面任意加一个字符
  2. 在字符串前面或者后面加上当前字符串的镜像

思路:

  1. 思考一下,花费(操作数)要最小(少),那肯定操作2越多越好呀!
  2. 同时,若过程中经过了操作2,那目的串一定是由某一次操作2以及在字符串前后使用操作1得来的,因此我们只需要求出这样的最优解就行了;否则输出初始化的答案—t的长度
  3. 由于操作2只能得到偶数长度的串,因此若我们建好回文自动机,那么只有0节点后面的节点才有用(思考一下)
  4. 同时,在由父节点更新子节点过程中,若直接前后增加一个字符,则花费为2;但如果在形成父节点前就先在前(或后)增加那个字符,再进行一次操作2,则花费可以减少1,也就是额外花费仅为1
  5. 因此,从父节点到子节点的更新方程为: d p [ s o n ] = d p [ f a ] + 1 dp[son]=dp[fa]+1 dp[son]=dp[fa]+1
  6. 但由于子节点的最优花费还可能是由某个 f a i l fail fail进行操作而来的,因此还需要考虑 f a i l fail fail
  7. 我们先预处理好不超过当前回文长度一半的最长回文后缀(即 h a l f half half数组),然后先通过操作1到达当前回文串的一半,最后通过操作2直接得到当前回文串;
  8. 以上两种转移方式应该是所有转移里面的最优策略了,再取个最值更新答案即可
  9. 而为了保证转移的可利用性,我们不妨使用 B F S BFS BFS进行更新( D F S DFS DFS也行)

题目描述

Viruses are usually bad for your health. How about fighting them with… other viruses? In this problem, you need to find out how to synthesize such good viruses.

We have prepared for you a set of strings of the letters A , G , T A,G,T A,G,T and C C C. They correspond to the DNA nucleotide sequences of viruses that we want to synthesize, using the following operations:

Adding a nucleotide either to the beginning or the end of the existing sequence.
Replicating the sequence, reversing the copied piece, and gluing it either to the beginning or to the end of the original (so that e.g., A G T C AGTC AGTC can become A G T C C T G A AGTCCTGA AGTCCTGA or C T G A A G T C CTGAAGTC CTGAAGTC).
We’re concerned about efficiency, since we have very many such sequences, some of them very long. Find a way to synthesize them in a minimum number of operations.

输入格式

The first line of input contains the number of test cases T T T. The descriptions of the test cases follow:

Each test case consists of a single line containing a non-empty string. The string uses only the capital letters A , C , G A,C,G A,C,G and T T T and is not longer than 100000 characters.

输出格式

For each test case, output a single line containing the minimum total number of operations necessary to construct the given sequence.

输入输出样例

输入
4
AAAA
AGCTTGCA
AAGGGGAAGGGGAA
AAACAGTCCTGACAAAAAAAAAAAAC
输出
3
8
6
18

//#pragma comment(linker, "/STACK:102400000,102400000")
#include "bits/stdc++.h"
#define pb push_back
#define ls l,m,now<<1
#define rs m+1,r,now<<1|1
#define hhh printf("hhh\n")
#define see(x) (cerr<<(#x)<<'='<<(x)<<endl)
using namespace std;
typedef long long ll;
typedef pair<int,int> pr;
inline int read() {int x=0;char c=getchar();while(c<'0'||c>'9')c=getchar();while(c>='0'&&c<='9')x=x*10+c-'0',c=getchar();return x;}

const int maxn = 1e5+10;
const int mod = 1e9+7;
const double eps = 1e-9;

char s[maxn];
map<char,int> Hash;
int ch[maxn][4], fail[maxn], half[maxn], len[maxn], fa[maxn];
int last, sz, pos;
int dp[maxn];

void init() {
	for(int i=0; i<=sz; ++i) for(int j=0; j<4; ++j) ch[i][j]=0;
	fail[0]=fail[1]=sz=last=1;
	len[1]=pos=-1;
}

void add(int c) {
	++pos; int p=last;
	while(s[pos-len[p]-1]!=s[pos]) p=fail[p];
	if(!ch[p][c]) {
		int np=++sz, fp=fail[p];
		len[np]=len[p]+2, fa[np]=p;
		while(s[pos-len[fp]-1]!=s[pos]) fp=fail[fp];
		fail[np]=ch[fp][c];
		if(len[fail[np]]<=len[np]/2) half[np]=fail[np];
		else {
			int hp=half[p];
			while(len[hp]+2>len[np]/2||s[pos-len[hp]-1]!=s[pos]) hp=fail[hp];
			half[np]=ch[hp][c];
		}
		ch[p][c]=np;
	}
	last=ch[p][c];
}

int main() {
	//ios::sync_with_stdio(false);
	Hash['A']=0, Hash['G']=1, Hash['T']=2, Hash['C']=3;
	int T=read();
	while(T--) {
		scanf("%s", s);
		init();
		for(int i=0; s[i]; ++i) add(Hash[s[i]]);
		int ans=pos+1;
		for(int i=2; i<=sz; ++i) dp[i]=len[i];
		queue<int> q;
		q.push(0), dp[0]=1;
		while(q.size()) {
			int now=q.front(); q.pop();
			for(int i=0; i<4; ++i) {
				int v=ch[now][i];
				if(!v) continue;
				dp[v]=dp[now]+1;
				int hp=half[v];
				dp[v]=min(dp[v],dp[hp]+1+len[v]/2-len[hp]);
				ans=min(ans,dp[v]+pos+1-len[v]);
				q.push(v);
			}
		}
		printf("%d\n", ans);
	}
}