2015
09-17

# BWT

When the problem to match S string in T string is mentioned, people always put KMP, Aho-Corasick and Suffixarray forward. But Mr Liu tells Canoe that there is an algorithm called Burrows�CWheeler Transform(BWT) which is quite amazing and high-efficiency to solve the problem.
But how does BWT work to solve the matching S-in-T problem? Mr Liu tells Canoe the firstly three steps of it.
Firstly, we append the ‘$’ to the end of T and for convenience, we still call the new string T. And then for every suffix of T string which starts from i, we append the prefix of T string which ends at (i �C 1) to its end. Secondly, we sort these new strings by the dictionary order. And we call the matrix formed by these sorted strings Burrows Wheeler Matrix. Thirdly, we pick characters of the last column to get a new string. And we call the string of the last column BWT(T). You can get more information from the example below. Then Mr Liu tells Canoe that we only need to save the BWT(T) to solve the matching problem. But how and can it? Mr Liu smiles and says yes. We can find whether S strings like “aac” are substring of T string like “acaacg” or not only knowing the BWT(T)! What an amazing algorithm BWT is! But Canoe is puzzled by the tricky method of matching S strings in T string. Would you please help Canoe to find the method of it? Given BWT(T) and S string, can you help Canoe to figure out whether S string is a substring of string T or not? 输入: There are multiple test cases. First Line: the BWT(T) string (1 <= length(BWT(T)) <= 100086). Second Line: an integer n ( 1 <=n <= 10086) which is the number of S strings. Then n lines comes. There is a S string (n * length(S) will less than 2000000, and all characters of S are lowercase ) in every line. 输出: There are multiple test cases. First Line: the BWT(T) string (1 <= length(BWT(T)) <= 100086). Second Line: an integer n ( 1 <=n <= 10086) which is the number of S strings. Then n lines comes. There is a S string (n * length(S) will less than 2000000, and all characters of S are lowercase ) in every line. 样例输入: gc$aaac
2
aac
gc

YES
NO
Hint
A naive method will not be accepted.


gc$aaac 1234567 排序之后变成了$aaaccg

3456271

\$->a->c->a->a->c->g

3     5   2   4    6    7

#include <cstdio>
#include <iostream>
#include <cstring>
#include <algorithm>
#define maxn 100186
using namespace std;

struct node
{
char ch;
int index;
bool operator < (const node & cmp)const
{
return ch<cmp.ch;
}
}save[maxn];
char t[maxn];
char str[maxn],txt[maxn];
int next[maxn];

void getnext(int len)
{
next[0]=0;next[1]=0;
for(int i=1;i<len;i++)
{
int j=next[i];
while(j && txt[j]!=txt[i])j=next[j];
next[i+1]=txt[j]==txt[i]?j+1:0;
}
}

void find(int n,int m)
{
int j=0;
for(int i=0;i<n;i++)
{
while(j&&txt[j]!=str[i])j=next[j];
if(txt[j]==str[i])j++;
if(j==m){printf("YES\n");return ;}
}
printf("NO\n");
}

int main()
{
while(scanf("%s",t)!=EOF)
{
int len=strlen(t);
for(int i=0;i<len;i++)
{
save[i].ch=t[i];
save[i].index=i;
}

stable_sort(save,save+len);

int now=save[0].index;
for(int i=0;i<len-1;i++)
{
str[i]=save[now].ch;
now=save[now].index;
}

int q;
scanf("%d",&q);
while(q--)
{
scanf("%s",txt);
int m=strlen(txt);
getnext(m);

find(len-1,m);
}
}
return 0;
}