2014
11-30

# Computer Virus on Planet Pandora

Aliens on planet Pandora also write computer programs like us. Their programs only consist of capital letters (‘A’ to ‘Z’) which they learned from the Earth. On
planet Pandora, hackers make computer virus, so they also have anti-virus software. Of course they learned virus scanning algorithm from the Earth. Every virus has a pattern string which consists of only capital letters. If a virus’s pattern string is a substring of a program, or the pattern string is a substring of the reverse of that program, they can say the program is infected by that virus. Give you a program and a list of virus pattern strings, please write a program to figure out how many viruses the program is infected by.

There are multiple test cases. The first line in the input is an integer T ( T<= 10) indicating the number of test cases.

For each test case:

The first line is a integer n( 0 < n <= 250) indicating the number of virus pattern strings.

Then n lines follows, each represents a virus pattern string. Every pattern string stands for a virus. It’s guaranteed that those n pattern strings are all different so there
are n different viruses. The length of pattern string is no more than 1,000 and a pattern string at least consists of one letter.

The last line of a test case is the program. The program may be described in a compressed format. A compressed program consists of capital letters and
“compressors”. A “compressor” is in the following format:

[qx]

q is a number( 0 < q <= 5,000,000)and x is a capital letter. It means q consecutive letter xs in the original uncompressed program. For example, [6K] means
‘KKKKKK’ in the original program. So, if a compressed program is like:

AB[2D]E[7K]G

It actually is ABDDEKKKKKKKG after decompressed to original format.

The length of the program is at least 1 and at most 5,100,000, no matter in the compressed format or after it is decompressed to original format.

There are multiple test cases. The first line in the input is an integer T ( T<= 10) indicating the number of test cases.

For each test case:

The first line is a integer n( 0 < n <= 250) indicating the number of virus pattern strings.

Then n lines follows, each represents a virus pattern string. Every pattern string stands for a virus. It’s guaranteed that those n pattern strings are all different so there
are n different viruses. The length of pattern string is no more than 1,000 and a pattern string at least consists of one letter.

The last line of a test case is the program. The program may be described in a compressed format. A compressed program consists of capital letters and
“compressors”. A “compressor” is in the following format:

[qx]

q is a number( 0 < q <= 5,000,000)and x is a capital letter. It means q consecutive letter xs in the original uncompressed program. For example, [6K] means
‘KKKKKK’ in the original program. So, if a compressed program is like:

AB[2D]E[7K]G

It actually is ABDDEKKKKKKKG after decompressed to original format.

The length of the program is at least 1 and at most 5,100,000, no matter in the compressed format or after it is decompressed to original format.

3
2
AB
DCB
DACB
3
ABC
CDE
GHI
ABCCDEFIHG
4
ABB
ACDEE
BBB
FEEE
A[2B]CD[4E]F

0
3
2
HintIn the second case in the sample input, the reverse of the program is ‘GHIFEDCCBA’, and ‘GHI’ is a substring of the reverse, so the program is infected
by virus ‘GHI’.


T个测试数据

n个模版

#include <stdio.h>
#include <string.h>
#include <queue>
using namespace std;
inline int Max(int a,int b){return a>b?a:b;}
inline int Min(int a,int b){return a>b?b:a;}
int ANS;

#define N 5110000
#define maxnode 250100
#define sigma_size 26

struct Trie{
int ch[maxnode][sigma_size];
int val[maxnode];
int last[maxnode];
int f[maxnode];
int sz;
void init(){
sz=1;
memset(ch,0,sizeof(ch));
memset(val, 0, sizeof(val));
memset(f,0,sizeof(f));
memset(last,0,sizeof(last));
}
int idx(char c){
return c-'A';
}
void print(int j){
if(j){
printf("%d: %d\n", j, val[j]);
print(last[j]);
}
}

void Creat(char *s){
int u = 0, len = strlen(s);
for(int i = 0; i < len; i++){
int c = idx(s[i]);
if(!ch[u][c])  ch[u][c] = sz++;
u = ch[u][c];
}
val[u]  ++ ; //u若是单词结尾则为 +1
}

void find(char *T){
int len = strlen(T);
int j = 0;
for(int i = 0; i < len; i++){
int c = idx(T[i]);

j = ch[j][c];
int temp = j;
while(temp && val[temp]) { ANS += val[temp];  val[temp] = 0;  }
}

}

void getFail(){
queue<int> q;
f[0] = 0;
//初始化队列
for(int c = 0; c < sigma_size; c++)
if(ch[0][c])q.push(ch[0][c]);

while(!q.empty()){
int r = q.front(); q.pop();
for(int c = 0; c < sigma_size; c++){
int u = ch[r][c];
if(!u){ ch[r][c] = ch[f[r]][c]; continue; }
q.push(u);
int v = f[r];
while(v && !ch[v][c]) v = f[v]; //沿失配边追溯到可以匹配的(非原点)位置
f[u] = ch[v][c];
last[u] = val[f[u]] ? f[u] :  last[f[u]];
}
}
}
};
Trie ac;
char S1[N],S2[N];
int Slen;

void InputString(){
char c = getchar();
Slen = 0;

while(c=getchar() , c!='\n')
if(c!='[')S1[Slen++] = c;
else {
int x;
scanf("%d",&x);
c=getchar();
while( x-- )S1[Slen++]=c;
getchar();
}
S1[Slen]='\0';
}
int main(){

int T,i,j,n;scanf("%d",&T);

while(T--){
ac.init();
scanf("%d",&n);
for(i=0;i<n;i++){
scanf("%s",S1);
ac.Creat(S1);
}
ac.getFail();
InputString();
for(i=0;i<Slen;i++)S2[Slen - i -1] = S1[i];
S2[Slen]='\0';

ANS = 0;
ac.find(S1);
ac.find(S2);

printf("%d\n",ANS);

}
return 0;
}
/*
99

3
GGJ
G
GG
GGGJ

3
GGJ
G
GG
GGJ

ans:
3
3

*/


1. 这道题这里的解法最坏情况似乎应该是指数的。回溯的时候
O(n) = O(n-1) + O(n-2) + ….
O(n-1) = O(n-2) + O(n-3)+ …
O(n) – O(n-1) = O(n-1)
O(n) = 2O(n-1)

2. 如果两个序列的最后字符不匹配（即X [M-1]！= Y [N-1]）
L（X [0 .. M-1]，Y [0 .. N-1]）= MAX（L（X [0 .. M-2]，Y [0 .. N-1]），L（X [0 .. M-1]，Y [0 .. N-1]）
这里写错了吧。

3. 第23行：
hash = -1是否应该改成hash[s ] = -1

因为是要把从字符串s的start位到当前位在hash中重置

修改提交后能accept，但是不修改居然也能accept

4. 约瑟夫也用说这么长……很成熟的一个问题了，分治的方法解起来o(n)就可以了，有兴趣可以看看具体数学的第一章，关于约瑟夫问题推导出了一系列的结论，很漂亮

5. 约瑟夫也用说这么长……很成熟的一个问题了，分治的方法解起来o(n)就可以了，有兴趣可以看看具体数学的第一章，关于约瑟夫问题推导出了一系列的结论，很漂亮