2013
12-12

# DNA Assembly

Farmer John has performed DNA sequencing on his prize milk-producing cow, Bessie DNA sequences are ordered lists (strings) containing the letters ‘A’, ‘C’, ‘G’, and ‘T’.

As is usual for DNA sequencing, the results are a set of strings that are sequenced fragments of DNA, not entire DNA strings. A pair of strings like ‘GATTA’ and ‘TACA’ most probably represent the string ‘GATTACA’ as the overlapping characters are merged, since they were probably duplicated in the sequencing process.

Merging a pair of strings requires finding the greatest overlap between the two and then eliminating it as the two strings are concatenated together. Overlaps are between the end of one string and beginning of another string, NOT IN THE MIDDLE OF A STRING.

By way of example, the strings ‘GATTACA’ and ‘TTACA’ overlap completely. On the other hand, the strings ‘GATTACA’ and ‘TTA’ have no overlap at all, since the matching characters of one appear in the middle of the other, not at one end or the other. Here are some examples of merging strings, including those with no overlap:

GATTA + TACA -> GATTACA
TACA + GATTA -> TACAGATTA
TACA + ACA -> TACA
TAC + TACA -> TACA
ATAC + TACA -> ATACA
TACA + ACAT -> TACAT
Given a set of N (2 <= N <= 7) DNA sequences all of whose lengths are in the range 1..7, find and print length of the shortest possible sequence obtainable by repeatedly merging all N strings using the procedure described above. All strings must be merged into the resulting sequence.

The input consists of multiple test cases.
Each test case :
Line 1: A single integer N

Lines 2..N+1: Each line contains a single DNA subsequence
End of file.

For each pair of input output the length of the shortest possible string obtained by merging the subsequences. It is always possible � and required � to merge all the input strings to obtain this string.

4
GATTA
TAGG
ATCGA
CGCAT

13

HintHint
Explanation of the sample:

Such string is "CGCATCGATTAGG".

/*
Author: ACb0y
Date: 2010-9-05
Type: force
ProblemId: hdu 1583 DNA Assembly
Result: 2919263 2010-09-05 10:02:43 Accepted 1583 281MS 272K 1079 B C++ ACb0y
*/
#include <iostream>
#include <string>
using namespace std;

int n;
int ans;
int d[10];
int vis[10];
string str[10];

//字符串合并
string str_merge(string a, string b)
{
if (a == "")
{
return b;
}
int i;
int flag = 0;
int pos;
int alen = a.length();
int blen = b.length();

for (i = 1; i <= alen; i++)
{
if (b.substr(0, i) == a.substr(alen - i, i))
{
flag = 1;
pos = i;
}
}
if (flag)
{
return a + b.substr(pos, blen - pos);
}
else
{
return a + b;
}
}

//回溯法求N！
void dfs(int pos)
{
int i;
if (pos == n)
{
string temp = "";
for (i = 0; i < n; i++)
{
temp = str_merge(temp, str[d[i]]);
}
if (temp.length() < ans)
{
ans = temp.length();
}
}
else
{
for (i = 0; i < n; i++) if (!vis[i])
{
d[pos] = i;
vis[i] = 1;
dfs(pos + 1);
vis[i] = 0;
}
}
}

int main()
{
int i;
#ifndef ONLINE_JUDGE
freopen("1583.txt", "r", stdin);
#endif
while (cin >> n)
{
for (i = 0; i < n; i++)
{
cin >> str[i];
}
memset(vis, 0, sizeof(vis));
ans = 10000;
dfs(0);
cout << ans << endl;
}
return 0;
}

1. 其实国内大部分公司对算法都不够重视。特别是中小型公司老板根本都不懂技术，也不懂什么是算法，从而也不要求程序员懂什么算法，做程序从来不考虑性能问题，只要页面能显示出来就是好程序，这是国内的现状，很无奈。

2. 5.1处，反了；“上一个操作符的优先级比操作符ch的优先级大，或栈是空的就入栈。”如代码所述，应为“上一个操作符的优先级比操作符ch的优先级小，或栈是空的就入栈。”

3. 在方法1里面：

//遍历所有的边，计算入度
for(int i=0; i<V; i++)
{
degree = 0;