2013
11-12

# DNA Sequence

It’s well known that DNA Sequence is a sequence only contains A, C, T and G, and it’s very useful to analyze a segment of DNA Sequence，For example, if a animal’s DNA sequence contains segment ATC then it may mean that the animal may have a genetic disease. Until now scientists have found several those segments, the problem is how many kinds of DNA sequences of a species don’t contain those segments.

Suppose that DNA sequences of a species is a sequence that consist of A, C, T and G，and the length of sequences is a given integer n.

First line contains two integer m (0 <= m <= 10), n (1 <= n <=2000000000). Here, m is the number of genetic disease segment, and n is the length of sequences.

Next m lines each line contain a DNA genetic disease segment, and length of these segments is not larger than 10.

An integer, the number of DNA sequences, mod 100000.

4 3
AT
AC
AG
AA

36

//* @author:
import java.util.Scanner;

public class Main{

public static int N = 4;
public static int SIZE = 120;
public int NUM = 0;
// 定义root数组，存储A,G,C,T组成的，最多需要不超过120
public static trieP trie[] = new trieP[SIZE];

public static void main(String[] args) {
Main ac = new Main();
trie[0] = new trieP();
// trie[0].fail = 0;

Scanner s = new Scanner(System.in);
int m = s.nextInt();
int n = s.nextInt();
for (int i = 0; i < m; i++)
ac.insert(s.next().toCharArray());
ac.acBuild();
ac.NUM++;
ac.solve(n);
}

// 构造矩阵，求矩阵的N次方
void solve(int n) {
long[][] matrix = new long[NUM][NUM];
for (int i = 0; i < NUM; i++)
if (trie[i].end == 0)
for (int j = 0; j < N; j++) {
if (trie[trie[i].son[j]].end == 0)
matrix[i][trie[i].son[j]]++;
}

int sum = 0;
for (int i = 0; i < NUM; i++)
if (trie[i].end == 0) {
sum += tm[0][i];
if (sum >= 10000)
sum %= 100000;
}
System.out.println(sum);
}

long[][] power(long[][] m, int n) {
if (n == 1)
return m;
if (n == 2)
return multi(m, m);
if ((n & 1) == 1)
return multi(m, power(m, n - 1));
else {
long tm[][] = power(m, n >> 1);
return multi(tm, tm);
}
}

long[][] multi(long[][] ma, long[][] mb) {
long[][] ret = new long[NUM][NUM];
for (int i = 0; i < NUM; i++)
for (int j = 0; j < NUM; j++)
for (int k = 0; k < NUM; k++) {
ret[i][j] += ma[i][k] * mb[k][j];
if (ret[i][j] >= 100000)
ret[i][j] = ret[i][j] % 100000;
}

return ret;
}

public int get(char c) {
switch (c) {
case 'A':
return 0;
case 'G':
return 1;
case 'C':
return 2;
case 'T':
return 3;
}
return -1;
}

public void insert(char[] key) {
int index;
int root = 0;
for (int i = 0; i < key.length; i++) {
if (trie[root].end > 0)
break;
index = get(key[i]);
if (trie[root].son[index] == -1) {
trie[root].son[index] = ++NUM;
trie[NUM] = new trieP();
}
root = trie[root].son[index];
}
trie[root].end++;
}

public void acBuild() {
int tail = 0;
int temp = 0, p = 0, i;
int q[] = new int[SIZE];
q[tail++] = 0;
for (i = 0; i < N; i++) {
p = trie[temp].son[i];
if (p != -1) {
if (temp == 0)
trie[p].fail = 0;
else {
trie[p].fail = trie[trie[temp].fail].son[i];
if (trie[trie[p].fail].end > 0)
trie[p].end++;
}
q[tail++] = trie[temp].son[i];
} else {
if (temp == 0)
trie[temp].son[i] = 0;
else
trie[temp].son[i] = trie[trie[temp].fail].son[i];
}
}
}

}

}

class trieP {
int N = 4;
int fail;
int son[];
int end;

public trieP() {
end = 0;
fail = -1;
son = new int[N];
for (int i = 0; i < N; i++)
son[i] = -1;
}
}

1. 有限自动机在ACM中是必须掌握的算法，实际上在面试当中几乎不可能让你单独的去实现这个算法，如果有题目要用到有限自动机来降低时间复杂度，那么这种面试题应该属于很难的级别了。

2. 约瑟夫也用说这么长……很成熟的一个问题了，分治的方法解起来o(n)就可以了，有兴趣可以看看具体数学的第一章，关于约瑟夫问题推导出了一系列的结论，很漂亮