首页 > 搜索 > DFS搜索 > HDU 1560 DNA sequence-字符串-[解题报告] C++
2013
12-12

HDU 1560 DNA sequence-字符串-[解题报告] C++

DNA sequence

问题描述 :

The twenty-first century is a biology-technology developing century. We know that a gene is made of DNA. The nucleotide bases from which DNA is built are A(adenine), C(cytosine), G(guanine), and T(thymine). Finding the longest common subsequence between DNA/Protein sequences is one of the basic problems in modern computational molecular biology. But this problem is a little different. Given several DNA sequences, you are asked to make a shortest sequence from them so that each of the given sequence is the subsequence of it.

For example, given "ACGT","ATGC","CGTT" and "CAGT", you can make a sequence in the following way. It is the shortest but may be not the only one.

输入:

The first line is the test case number t. Then t test cases follow. In each case, the first line is an integer n ( 1<=n<=8 ) represents number of the DNA sequences. The following k lines contain the k sequences, one per line. Assuming that the length of any sequence is between 1 and 5.

输出:

For each test case, print a line containing the length of the shortest sequence that can be made from these sequences.

样例输入:

1
4
ACGT
ATGC
CGTT
CAGT

样例输出:

8

HDU_1560

    一种可行的思路是迭代加深搜索,只要在不断增加递归深度限制时第一次出现生成了规定的字符串时,这个深度就是最优的深度了。

    接下来就是考虑如何剪枝了,一种明显的思路就是记录当前每个字符串还差多少位补全,然后取最大的差距,如果剩下的可以构造的字符数小于最大的差距的话,那么就可以剪枝了。但这个剪枝还不够强,比如这个数据就会很慢

8
AAAAA
GGGGG
CCCCC
TTTTT
TTTAA
AAATT
CCGGG
GGGCC

,根据这个数据就会想到一个更好的剪枝,我们考虑至少还需要多少个A,至少还需要多少个T以及C、G,然后把4项之和作为前面所说的“最大的差距”,用这个作为剪枝的条件就比较快了,上面那组数据就几乎瞬出了。

    至于至少需要多少个A,我们可以先算一下各个字符串还需要多少个A,然后取最大值,至于T、C、G也是一样的。

#include<stdio.h>
#include<string.h>
#include<algorithm>
#define MAXN 8
#define MAXL 5
int N, n[MAXN], a[MAXN][MAXL];
char ch[128];
void init()
{
    int i, j;
    char b[10];
    scanf("%d", &N);
    for(i = 0; i < N; i ++)
    {
        scanf("%s", b);
        n[i] = strlen(b);
        for(j = 0; j < n[i]; j ++) a[i][j] = ch[b[j]];
    }
}
int Max(int *x)
{
    int i, j, h[4], max[4] = {0};
    for(i = 0; i < N; i ++)
    {
        memset(h, 0, sizeof(h));
        for(j = x[i]; j < n[i]; j ++) ++ h[a[i][j]];
        for(j = 0; j < 4; j ++) max[j] = std::max(max[j], h[j]);
    }
    return max[0] + max[1] + max[2] + max[3];
}
int dfs(int d, int *ix)
{
    if(Max(ix) > d) return 0;
    if(d == 0) return 1;
    int i, j, x[MAXN];
    for(i = 0; i < 4; i ++)
    {
        for(j = 0; j < N; j ++)
        {
            if(ix[j] < n[j] && a[j][ix[j]] == i)
                x[j] = ix[j] + 1;
            else x[j] = ix[j];
        }    
        if(dfs(d - 1, x)) return 1;
    }
    return 0;
}
void solve()
{
    int dep, ini[8] = {0};
    for(dep = 1; !dfs(dep, ini); dep ++);
    printf("%d\n", dep);
}
int main()
{
    int t;
    ch['A'] = 0, ch['T'] = 1, ch['C'] = 2, ch['G'] = 3;
    scanf("%d", &t);
    while(t --)
    {
        init();
        solve();    
    }
    return 0;    
}

 

 

 

解题报告转自:http://www.cnblogs.com/staginner/archive/2012/08/29/2662442.html


  1. #include <stdio.h>
    int main(void)
    {
    int arr[] = {10,20,30,40,50,60};
    int *p=arr;
    printf("%d,%d,",*p++,*++p);
    printf("%d,%d,%d",*p,*p++,*++p);
    return 0;
    }

    为什么是 20,20,50,40,50. 我觉得的应该是 20,20,40,40,50 . 谁能解释下?

  2. “再把所有不和该节点相邻的节点着相同的颜色”,程序中没有进行不和该节点相邻的其他节点是否相邻进行判断。再说求出来的也不一样是颜色数最少的

  3. 我还有个问题想请教一下,就是感觉对于新手来说,递归理解起来有些困难,不知有没有什么好的方法或者什么好的建议?