2014
01-26

# DNA repair

Biologists finally invent techniques of repairing DNA that contains segments causing kinds of inherited diseases. For the sake of simplicity, a DNA is represented as a string containing characters ‘A’, ‘G’ , ‘C’ and ‘T’. The repairing techniques are simply to change some characters to eliminate all segments causing diseases. For example, we can repair a DNA "AAGCAG" to "AGGCAC" to eliminate the initial causing disease segments "AAG", "AGC" and "CAG" by changing two characters. Note that the repaired DNA can still contain only characters ‘A’, ‘G’, ‘C’ and ‘T’.

You are to help the biologists to repair a DNA by changing least number of characters.

The input consists of multiple test cases. Each test case starts with a line containing one integers N (1 ≤ N ≤ 50), which is the number of DNA segments causing inherited diseases.
The following N lines gives N non-empty strings of length not greater than 20 containing only characters in "AGCT", which are the DNA segments causing inherited disease.
The last line of the test case is a non-empty string of length not greater than 1000 containing only characters in "AGCT", which is the DNA to be repaired.

The last test case is followed by a line containing one zeros.

The input consists of multiple test cases. Each test case starts with a line containing one integers N (1 ≤ N ≤ 50), which is the number of DNA segments causing inherited diseases.
The following N lines gives N non-empty strings of length not greater than 20 containing only characters in "AGCT", which are the DNA segments causing inherited disease.
The last line of the test case is a non-empty string of length not greater than 1000 containing only characters in "AGCT", which is the DNA to be repaired.

The last test case is followed by a line containing one zeros.

2
AAA
AAG
AAAG
2
A
TG
TGAATG
4
A
G
C
T
AGT
0

Case 1: 1
Case 2: 4
Case 3: -1

AC自动机+DP。按着自动机跑，（其实是生成新的满足题目要求的串，然后找改变最少的。）但是不能跑到是单词的地方，如果跑到单词的话那么说明改变后的串含有病毒了，不满足题意。然后就是应该怎么跑的问题了，现在我们从自动机的根节点开始跑，如果跑到下一个节点和当前串的字母不一样的话，那么当前位置生成的串是和原串在该位置是有差异的，dp+1，否者的话dp不变。所以dp[ i ][ j ]表示的是匹配到当前匹配串的位置时，跑到自动机的 j 节点需要改变的最少字母数。

#include <algorithm>
#include <iostream>
#include <sstream>
#include <cstdlib>
#include <climits>
#include <cstring>
#include <cstdio>
#include <string>
#include <vector>
#include <cctype>
#include <queue>
#include <cmath>
#include <set>
#include <map>
#define CLR(a, b) memset(a, b, sizeof(a))
using namespace std;

const int MAX_NODE = 22 * 55 * 2;
const int INF = 0x3f3f3f3f;
const int CHILD_NUM = 4;
const int N = 1010;

class ACAutomaton
{
private:
int chd[MAX_NODE][CHILD_NUM];
int dp[N][MAX_NODE];
int fail[MAX_NODE];
bool val[MAX_NODE];
int Q[MAX_NODE];
int ID[128];
int sz;
public:
void Initialize()
{
fail[0] = 0;
ID['A'] = 0;ID['G'] = 1;
ID['C'] = 2;ID['T'] = 3;
}
void Reset()
{
CLR(chd[0] , 0);sz = 1;
}
void Insert(char *a)
{
int p = 0;
for ( ; *a ; a ++)
{
int c = ID[*a];
if (!chd[p][c])
{
CLR(chd[sz] , 0);
val[sz] = false;
chd[p][c] = sz ++;
}
p = chd[p][c];
}
val[p] = true;
}
void Construct()
{
int *s = Q , *e = Q;
for (int i = 0 ; i < CHILD_NUM ; i ++)
{
if (chd[0][i])
{
fail[ chd[0][i] ] = 0;
*e ++ = chd[0][i];
}
}
while (s != e)
{
int u = *s++;
for (int i = 0 ; i < CHILD_NUM ; i ++)
{
int &v = chd[u][i];
if (v)
{
*e ++ = v;
fail[v] = chd[ fail[u] ][i];
val[v] |= val[fail[v]];
}
else
{
v = chd[ fail[u] ][i];
}
}
}
}
int Work(char *ch)
{
int len, S, T, ret;
len = strlen(ch);
CLR(dp, INF);dp[0][0] = 0;
for(int i = 0; i < len; i ++)
for(int j = 0; j < sz; j ++)
{
if(val[j]) continue;
if(dp[i][j] == INF) continue;
for(int k = 0; k < 4; k ++)
{
T = chd[j][k];
if(val[T]) continue;
dp[i + 1][T] = min(dp[i + 1][T], dp[i][j] + (ID[ch[i]] != k));
}
}ret = INF;
for(int i = 0; i < sz; i ++)
{
ret = min(ret, dp[len][i]);
}
return ret == INF ? -1 : ret;
}
} AC;

char ch[N];

int main()
{
//freopen("input.txt", "r", stdin);
AC.Initialize();
int n, t, cas = 1;
while (scanf("%d", &n), n)
{
AC.Reset();
for (int i = 0 ; i < n ; i ++)
{
char temp[55];
scanf("%s", temp);
AC.Insert(temp);
}
scanf("%s", ch);
AC.Construct();
printf("Case %d: %d\n", cas ++, AC.Work(ch));
}
return 0;
}

1. Thanks for using the time to examine this, I truly feel strongly about it and enjoy finding out far more on this subject matter. If achievable, as you achieve knowledge

2. #!/usr/bin/env python
def cou(n):
arr =
i = 1
while(i<n):
arr.append(arr[i-1]+selfcount(i))
i+=1
return arr[n-1]

def selfcount(n):
count = 0
while(n):
if n%10 == 1:
count += 1
n /= 10
return count

3. 有限自动机在ACM中是必须掌握的算法，实际上在面试当中几乎不可能让你单独的去实现这个算法，如果有题目要用到有限自动机来降低时间复杂度，那么这种面试题应该属于很难的级别了。

4. 有两个重复的话结果是正确的，但解法不够严谨，后面重复的覆盖掉前面的，由于题目数据限制也比较严，所以能提交通过。已更新算法