首页 > 专题系列 > Java解POJ > POJ 2778 DNA Sequence [解题报告] Java
2013
11-12

POJ 2778 DNA Sequence [解题报告] Java

DNA Sequence

问题描述 :

It’s well known that DNA Sequence is a sequence only contains A, C, T and G, and it’s very useful to analyze a segment of DNA Sequence,For example, if a animal’s DNA sequence contains segment ATC then it may mean that the animal may have a genetic disease. Until now scientists have found several those segments, the problem is how many kinds of DNA sequences of a species don’t contain those segments.

Suppose that DNA sequences of a species is a sequence that consist of A, C, T and G,and the length of sequences is a given integer n.

输入:

First line contains two integer m (0 <= m <= 10), n (1 <= n <=2000000000). Here, m is the number of genetic disease segment, and n is the length of sequences.

Next m lines each line contain a DNA genetic disease segment, and length of these segments is not larger than 10.

输出:

An integer, the number of DNA sequences, mod 100000.

样例输入:

4 3
AT
AC
AG
AA

样例输出:

36

解题代码:

//* @author:
import java.util.Scanner;

public class Main{

 public static int N = 4;
 public static int SIZE = 120;
 public int NUM = 0;
 // 定义root数组,存储A,G,C,T组成的,最多需要不超过120
 public static trieP trie[] = new trieP[SIZE];

 public static void main(String[] args) {
  Main ac = new Main();
  trie[0] = new trieP();
  // trie[0].fail = 0;

  Scanner s = new Scanner(System.in);
  int m = s.nextInt();
  int n = s.nextInt();
  for (int i = 0; i < m; i++)
	ac.insert(s.next().toCharArray());
  ac.acBuild();
  ac.NUM++;
  ac.solve(n);
  }


   // 构造矩阵,求矩阵的N次方
   void solve(int n) {
	long[][] matrix = new long[NUM][NUM];
	for (int i = 0; i < NUM; i++)
          if (trie[i].end == 0)
	  for (int j = 0; j < N; j++) {
	    if (trie[trie[i].son[j]].end == 0)
		matrix[i][trie[i].son[j]]++;
	   }

	int sum = 0;
	for (int i = 0; i < NUM; i++)
          if (trie[i].end == 0) {
		sum += tm[0][i];
		if (sum >= 10000)
	         sum %= 100000;
	   }
	   System.out.println(sum);
	}

     long[][] power(long[][] m, int n) {
	if (n == 1)
	  return m;
	if (n == 2)
	  return multi(m, m);
	if ((n & 1) == 1)
	  return multi(m, power(m, n - 1));
	else {
	  long tm[][] = power(m, n >> 1);
	  return multi(tm, tm);
	}
     }

    long[][] multi(long[][] ma, long[][] mb) {
	long[][] ret = new long[NUM][NUM];
	for (int i = 0; i < NUM; i++)
	 for (int j = 0; j < NUM; j++)
	  for (int k = 0; k < NUM; k++) {
	    ret[i][j] += ma[i][k] * mb[k][j];
	    if (ret[i][j] >= 100000)
		ret[i][j] = ret[i][j] % 100000;
	   }

	return ret;
     }


    public int get(char c) {
       switch (c) {
	case 'A':
	  return 0;
	case 'G':
	  return 1;
	case 'C':
	  return 2;
	case 'T':
	  return 3;
       }
	return -1;
     }

   public void insert(char[] key) {
	int index;
	int root = 0;
	for (int i = 0; i < key.length; i++) {
          if (trie[root].end > 0)
	    break;
	  index = get(key[i]);
	  if (trie[root].son[index] == -1) {
            trie[root].son[index] = ++NUM;
	    trie[NUM] = new trieP();
	   }
	  root = trie[root].son[index];
        }
	trie[root].end++;
    }


   public void acBuild() {
      int head = 0;
      int tail = 0;
      int temp = 0, p = 0, i;
      int q[] = new int[SIZE];
      q[tail++] = 0;
      while (head < tail) {
	temp = q[head++];
	for (i = 0; i < N; i++) {
	   p = trie[temp].son[i];
	   if (p != -1) {
	     if (temp == 0)
		trie[p].fail = 0;
	     else {
		trie[p].fail = trie[trie[temp].fail].son[i];
		if (trie[trie[p].fail].end > 0)
		  trie[p].end++;
	     }
	     q[tail++] = trie[temp].son[i];
	   } else {
	     if (temp == 0)
		trie[temp].son[i] = 0;
	     else
		trie[temp].son[i] = trie[trie[temp].fail].son[i];
	    }
	 }
       }

   }

  }

class trieP {
	int N = 4;
	int fail;
	int son[];
	int end;

	public trieP() {
          end = 0;
	  fail = -1;
	  son = new int[N];
	  for (int i = 0; i < N; i++)
	    son[i] = -1;
	}
}

  1. 有限自动机在ACM中是必须掌握的算法,实际上在面试当中几乎不可能让你单独的去实现这个算法,如果有题目要用到有限自动机来降低时间复杂度,那么这种面试题应该属于很难的级别了。

  2. 约瑟夫也用说这么长……很成熟的一个问题了,分治的方法解起来o(n)就可以了,有兴趣可以看看具体数学的第一章,关于约瑟夫问题推导出了一系列的结论,很漂亮