首页 > ACM题库 > HDU-杭电 > HDU 4641-K-string-字符串-[解题报告]HOJ
2015
09-17

HDU 4641-K-string-字符串-[解题报告]HOJ

K-string

问题描述 :

Given a string S. K-string is the sub-string of S and it appear in the S at least K times.It means there are at least K different pairs (i,j) so that Si,Si+1… Sj equal to this K-string. Given m operator or query:1.add a letter to the end of S; 2.query how many different K-string currently.For each query ,count the number of different K-string currently.

输入:

The input consists of multiple test cases.
Each test case begins with a line containing three integers n, m and K(1<=n,K<=50000,1<=m<=200000), denoting the length of string S, the number of operator or question and the least number of occurrences of K-string in the S.
The second line consists string S,which only contains lowercase letters.
The next m lines describe the operator or query.The description of the operator looks as two space-separated integers t c (t = 1; c is lowercase letter).The description of the query looks as one integer t (t = 2).

输出:

The input consists of multiple test cases.
Each test case begins with a line containing three integers n, m and K(1<=n,K<=50000,1<=m<=200000), denoting the length of string S, the number of operator or question and the least number of occurrences of K-string in the S.
The second line consists string S,which only contains lowercase letters.
The next m lines describe the operator or query.The description of the operator looks as two space-separated integers t c (t = 1; c is lowercase letter).The description of the query looks as one integer t (t = 2).

样例输入:

3 5 2
abc
2
1 a
2
1 a
2

样例输出:

0
1
1

每次给字符串加一个字符,并询问当时出现至少k次的子串有多少

貌似数据特别水,看网上题解都是暴力过的,所以自然要给出一个不会超时的算法

当初考的时候,觉得很难维护,因为每次相当于在后缀树上拆一条边,并加点进去,并维护当根路径上的值,好像只能动态树之类的,即便离线,貌似也要用树链剖分,因此当时就没管了

回过头来看,其实还是挺好维护的,甚至只要离线在dfs序上处理就可以了

考虑先把后缀树建好,然后每次操作在后缀自动机上走去修改后缀树上的值。考虑这样离线做的合法性,每次走到一个节点,去把他的祖先们全部加1,首先这些祖先所代表的字符串都是当前串的后缀,因此一定是之前出现过的,然后,自己这个节点所能接受的所有子串是(len[rt[i]],len[i]],而我们走到这个节点所经过的长度也正好是len[i],所以这个节点所能接受的所有子串都是需要加1的,因此这个做法正确性是有保障的

然后考虑怎么好的实现,我们可以考虑用一些指针指出最上的不被接受的节点是哪些,显然随着插入操作的增加,这些指针会单调往下走,均摊就是线性的,而且修改的时候也只会影响到一个指针,考虑怎样快速找到一个指针,这个操作我们可以在dfs序上实现,每次分出一个指针,相当于在一段区间上覆盖上一个值,因此就是一个区间修改,单点查询的线段树,然后每次我们要知道又要询问其权值是否大于等于k才能考虑是否下放指针,这个又可以在dfs序上用一个单点修改,区间查询的线段树实现

复杂度o(nlogn)

线段树部分用zkw线段树写出来十分简短

#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <iostream>
#include <algorithm>
using namespace std;
int next[500000][26];
int net[2000000],sora[2000000];
int rt[500000],l[500000],op[500000],ed[500000],tail[500000];
char ch[500000];
long long ans;
int n,m,ss,s1,cm,k,m1,w_time,top;
int b[1048576],t[1048576],s[1048576];
int q[500000];
void suffix_sam(int &last,int chr)
{
	int x,y;
	s1++,l[s1]=l[last]+1;
	for (x=last,last=s1;x && (!next[x][chr]);x=rt[x]) next[x][chr]=s1;
	y=next[x][chr];
	if (!y) next[x][chr]=s1,rt[s1]=0;
	else if (l[x]+1==l[y]) rt[s1]=y;
	else {
		s1++,l[s1]=l[x]+1;
		for (int j=0;j<='z'-'a';j++) next[s1][j]=next[y][j];
		rt[s1]=rt[y],rt[y]=s1,rt[last]=s1;
		for (;x && (next[x][chr]==y);x=rt[x]) next[x][chr]=s1;
		if (next[x][chr]==y) next[x][chr]=s1;
	}
}
void origin()
{
	for (int i=0;i<=s1;i++) {
		for (int j=0;j<='z'-'a';j++)
			next[i][j]=0;
	}
	for (int i=1;i<=m1+m1;i++) b[i]=t[i]=s[i]=0;
	s1=0,ss=0,m1=0;
}
void link(int x,int y)
{
	++ss,net[tail[x]]=ss,tail[x]=ss,sora[ss]=y,net[ss]=0;
}
void dfs(int s)
{
	++top,op[s]=top;
	for (int i=s,ne;net[i];) {
		i=net[i],ne=sora[i];
		dfs(ne);
	}
	ed[s]=top;
}
void change(int l,int r,int x)
{
	++w_time;
	l+=m1-1,r+=m1+1;
	for (;!((l^r)==1);l>>=1,r>>=1) {
		if ((l&1)==0) b[l+1]=x,t[l+1]=w_time;
		if ((r&1)==1) b[r-1]=x,t[r-1]=w_time;
	}
}
int ask(int x)
{
	int b1=0,t1=0;
	for (x+=m1;x;x>>=1)
		if (t[x]>t1) b1=b[x],t1=t[x];
	return b1;
}
void modify(int x)
{
	for (x+=m1;x;x>>=1) s[x]++;
}
int query(int l,int r)
{
	int sum=0;
	l+=m1-1,r+=m1+1;
	for (;!((l^r)==1);l>>=1,r>>=1) {
		if ((l&1)==0) sum+=s[l+1];
		if ((r&1)==1) sum+=s[r-1];
	}
	return sum;
}
void pushdown(int x)
{
	ans+=l[x]-l[rt[x]];
	for (int i=x,ne;net[i];) {
		i=net[i],ne=sora[i];
		int cnt=query(op[ne],ed[ne]);
		if (cnt>=k) pushdown(ne);
		else change(op[ne],ed[ne],ne);
	}
}
void doit(int x)
{
	modify(op[x]);
	int y=ask(op[x]);
	int cnt=query(op[y],ed[y]);
	if (cnt>=k) pushdown(y);
}
char c[20];
int main()
{
	for (;scanf("%d%d%d",&n,&m,&k)==3;) {
		origin();
		scanf("%s",ch+1);
		cm=0;
		for (int i=1;i<=m;i++) {
			scanf("%d",&q[i]);
			if (q[i]==1) {
				scanf("%s",c+1);
				++cm;
				ch[cm+n]=c[1];
			}
		}
		int last=0;
		for (int i=1;i<=n+cm;i++) suffix_sam(last,ch[i]-'a');
		ss=s1;
		for (int i=0;i<=s1;i++) tail[i]=i,net[i]=0;
		for (m1=1;m1<=s1+2;m1<<=1) ;
		for (int i=1;i<=s1;i++) {
			link(rt[i],i);
		}
		for (int i=0;i<=s1;i++) op[i]=ed[i]=0;
		top=0;
		dfs(0);
		int s=0;
		ans=0;
		w_time=0;
		for (int i=1;i<=n;i++) {
			s=next[s][ch[i]-'a'];
			doit(s);
		}
		cm=0;
		for (int i=1;i<=m;i++) {
			if (q[i]==2) {
				printf("%lld\n",ans);
			}
			else {
				cm++;
				s=next[s][ch[n+cm]-'a'];
				doit(s);
			}
		}
	}
	return 0;
} 

版权声明:本文为博主原创文章,未经博主允许不得转载。

参考:http://blog.csdn.net/huyuncong/article/details/14647611