首页 > ACM题库 > HDU-杭电 > HDU 3962-Microgene-动态规划-[解题报告]HOJ
2015
04-14

HDU 3962-Microgene-动态规划-[解题报告]HOJ

Microgene

问题描述 :

sevenzero is very interesting in Bioinformation and have done some research on it. One day, sevenzero found a phenomenon called Microgene. Microgene is a special fragment in the DNA, and different Microgenes may have the same hereditary effect. Microgene works if and only if there are more than one Microgenes(Microgenes may overlap) with the same hereditary effect in the DNA. To finish his paper, sevenzero wants to know how many different DNAs with length L which contain the hereditary effect caused by Microgenes.

To simplify the problem, a DNA or a Microgene is considerd as a string consisting of character ‘A’, ‘T’, ‘C’ and ‘G’. And a Microgene is in the DNA if the Microgene string is the substring of the DNA string. All Microgenes given are different and with the same hereditary effect.

输入:

There are several test cases in the input. Each case begins with a line with an integer N (1 ≤ N ≤ 6) and L (1 ≤ N ≤ 1000000), denoting the number of Microgenes and the length of DNA. The following N lines contain N strings representing the Microgenes.The length of the Microgene is no more than 5. The input is terminated by EOF.

输出:

There are several test cases in the input. Each case begins with a line with an integer N (1 ≤ N ≤ 6) and L (1 ≤ N ≤ 1000000), denoting the number of Microgenes and the length of DNA. The following N lines contain N strings representing the Microgenes.The length of the Microgene is no more than 5. The input is terminated by EOF.

样例输入:

2 3
AT
TC
2 3
ATC
T
3 1000000
ATCG
TCGT
CTAG

样例输出:

1
11
5063

题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=3962


题目大意给定m个DNA病毒序列,求碱基构成的长度为n且含有两个以上DNA病毒序列,结果对10007取模。


解题思路:本题代码量大,较为综合,需用到AC自动机改造而成的Trie图、DP思想、矩阵快速幂。

     如果n比较小,那么本题可以用DP解,由于题目明显的有三个状态,未含病毒串、含一个病毒串,含两个及两个以上病毒,根据这三个就可以写出一个状态转移方程。但是本题可以简化一下,先求出总的组合种数,再减去含有一个病毒串和未含病毒串的种数就是解了。那么状态就只有2个。

     状态转移方程为:if (j->next 为病毒串)        dp[i+1][j->next][1] += dp[i][j][0] ;

                                   else if (j->next非病毒串)   dp[i+1][j->next][1]  += dp[i][j][1];

                                                                               dp[i+1][j->next][0] += dp[i][j][0];

    但是本题n特别大,必须用矩阵进行优化。先将Trie图转化为一个(total * 2) * (total * 2)(total为总节点数)可达矩阵,如果i  < total,那说明这个节点和他的后缀不含有病毒串,如果i > total,那说明这个节点和他的后缀含有1个病毒串。

    具体实现是这样的,if (i->next->flag) matrix[i][i->next+total]++;

                                       else matrix[i][i->next]++,matrix[i+total][i->next+total]++;

     这样,矩阵就被分成四块相当于四个象限,第2个象限(i和j都小于total)怎么走都不会出现病毒串,那么经过A^n,他们的值就是最后病毒序列为0个的种数。第1个象限表示i走到j-total会出现一个病毒DNA序列,第四个象限i-total走到j-total,原来含1个病毒串现在还是1个。

     代码很长,但模块化还是使它变的易懂,函数名改成Solve_1A,结果真真1A.


测试数据:


2 3
ATC
T


2 3
AT
TC

1 1
A

3 1000000
ATCG
TCGT
CTAG

代码:

#include <stdio.h>
#include <string.h>
#define MAX 40
#define MOD 10007


char dir[MAX];
int hash[256],n,m,total;


struct node {

	int flag,in;
	node *fail,*next[4];
}*qu[MAX*MAX],arr[MAX*MAX],*root;

node *CreateNode() {

	node *p = &arr[total];
	p->in = total++;
	p->flag = 0;
	p->fail = NULL;
	for (int i = 0; i < 4; ++i)
		p->next[i] = NULL;
	return p;
}
void Insert(char *str) {

	int i = 0,k;
	node *p = root;


	while (str[i]) {

		k = hash[str[i++]];
		if (p->next[k] == NULL)
			p->next[k] = CreateNode();
		p = p->next[k];
	}
	p->flag = 1;
}
void Build_AC() {

	int head,tail,i;
	head = tail = 0;


	root->fail = root;
	qu[++head] = root;
	while (tail < head) {

		node *p = qu[++tail];
		for (i = 0; i < 4; ++i) {

			if (p->next[i] != NULL) {

				if (p == root) p->next[i]->fail = root;
				else p->next[i]->fail = p->fail->next[i];
				qu[++head] = p->next[i];
				p->next[i]->flag |= p->fail->next[i]->flag;
			}
			else {

				if (p == root) p->next[i] = root;
				else p->next[i] = p->fail->next[i];
			}//else
		}//for
	}//while
}//void


/*-----------Trie图构造完成-------------------*/
/*-----------    矩阵模板  -------------------*/


struct Mat{

	int mat[MAX*2][MAX*2],size;
	Mat(int n) {size = n,memset(mat,0,sizeof(mat));}
	Mat(int arr[][MAX*2],int n) { size = n,memcpy(mat,arr,sizeof(arr));}


	friend Mat operator *(Mat a,Mat b);
	friend Mat operator +(Mat a,Mat b);
	friend Mat operator ^(Mat a,int k);
	friend Mat Sum(Mat a,int k);
}A(MAX*2),E(MAX*2);
Mat operator *(Mat a,Mat b) {

	Mat c(total*2);
	for (int i = 0; i < c.size; ++i)
		for (int j = 0; j < c.size; ++j)
			for (int k = 0; k < c.size; ++k)
				if (a.mat[i][k] && b.mat[k][j]) {

					c.mat[i][j] += a.mat[i][k] * b.mat[k][j];
					c.mat[i][j] %= MOD;
				}
	return c;
}
Mat operator +(Mat a,Mat b) {

	Mat c(total*2);
	for (int i = 0; i < c.size; ++i)
		for (int j = 0; j < c.size; ++j)
			c.mat[i][j] = (a.mat[i][j] + b.mat[i][j]) % MOD;
	return c;
}
Mat operator ^(Mat a,int k) {

	Mat c = E;
	while (k) {

		if (k & 1) c = c * a;
		a = a * a,k >>= 1;
	}
	return c;
}
Mat Sum(Mat A,int k) {

	if (k == 1) return A;
	else if (k & 1) return (A^k) * Sum(A,k-1);
	else return Sum(A,k/2) * ((A^(k/2)) + E);
}


/*-----------矩阵模板完成-------------------*/
/*-----------    开始DP  -------------------*/
void GetMat() {

	int i,j,k;
	Mat c(total*2);


	for (i = 0; i < total; ++i) {

		for (k = 0; k < 4; ++k) {

			j = arr[i].next[k]->in;	
			if (arr[i].next[k]->flag)//危险节点
				c.mat[i][j+total]++;
			else c.mat[i][j]++,c.mat[i+total][j+total]++;
		}
	}


	E = A = c;
	memset(E.mat,0,sizeof(E.mat));
	for (i = 0; i < E.size; ++i)
		E.mat[i][i] = 1;
}
int Solve_1A() {

	int i,j,k,ans,sum;


	ans = 1,k = n,sum = 4;
	while (k) {

		if (k & 1) ans = (ans * sum) % MOD;
		sum = (sum * sum) % MOD,k >>= 1;
	}


	A = A^n;
	for (i = 0; i < 2 * total; ++i)
		ans = (ans - A.mat[0][i] + MOD) % MOD;
	return ans;
}


int main()
{
	int i,j,k,ans;
	hash['A'] = 0,hash['T'] = 1;
	hash['C'] = 2,hash['G'] = 3;


	while (scanf("%d%d",&m,&n) != EOF) {

		total = 0;
		root = CreateNode();
		for (i = 0; i < m; ++i)
			scanf("%s",dir),Insert(dir);
		
		
		Build_AC();
		GetMat();
		ans = Solve_1A();
		printf("%d\n",ans % MOD);
	}
}

本文ZeroClock原创,但可以转载,因为我们是兄弟。

版权声明:本文为博主原创文章,未经博主允许不得转载。

参考:http://blog.csdn.net/woshi250hua/article/details/7599472