首页 > ACM题库 > HDU-杭电 > HDU 3518-Boring counting-后缀数组-[解题报告]HOJ
2014
11-05

HDU 3518-Boring counting-后缀数组-[解题报告]HOJ

Boring counting

问题描述 :

035 now faced a tough problem,his english teacher gives him a string,which consists with n lower case letter,he must figure out how many substrings appear at least twice,moreover,such apearances can not overlap each other.
Take aaaa as an example.”a” apears four times,”aa” apears two times without overlaping.however,aaa can’t apear more than one time without overlaping.since we can get “aaa” from [0-2](The position of string begins with 0) and [1-3]. But the interval [0-2] and [1-3] overlaps each other.So “aaa” can not take into account.Therefore,the answer is 2(“a”,and “aa”).

输入:

The input data consist with several test cases.The input ends with a line “#”.each test case contain a string consists with lower letter,the length n won’t exceed 1000(n <= 1000).

输出:

The input data consist with several test cases.The input ends with a line “#”.each test case contain a string consists with lower letter,the length n won’t exceed 1000(n <= 1000).

样例输入:

aaaa
ababcabb
aaaaaa
#

样例输出:

2
3
3

转载请注明出处,谢谢http://blog.csdn.net/acm_cxlove/article/details/7854526      
by—cxlove 

题目:给出一个字符串,求出至少不重叠出现2次以上的子串有多少个。

http://acm.hdu.edu.cn/showproblem.php?pid=3518 

求一次后缀数组,枚举子串长度

通过height数组将后缀分组,同一组内都是拥有一定长度的相同前缀。

由于 题目要求不重叠,然后找到位置的最大和最小,判断之差是否满足不重叠即可

比较简单

#include<iostream>
#include<cstdio>
#include<cstring>
#include<queue>
#include<cmath>
#include<string>
#include<vector>
#include<algorithm>
#include<map>
#include<set>
#define maxn 1000005
#define eps 1e-8
#define inf 1<<30
#define zero(a) fabs(a)<eps
using namespace std;
//以下为倍增算法求后缀数组
int wa[maxn],wb[maxn],wv[maxn],Ws[maxn];
int cmp(int *r,int a,int b,int l)
{return r[a]==r[b]&&r[a+l]==r[b+l];}
void da(const char *r,int *sa,int n,int m){
	int i,j,p,*x=wa,*y=wb,*t; 
	for(i=0;i<m;i++) Ws[i]=0; 
	for(i=0;i<n;i++) Ws[x[i]=r[i]]++; 
	for(i=1;i<m;i++) Ws[i]+=Ws[i-1]; 
	for(i=n-1;i>=0;i--) sa[--Ws[x[i]]]=i; 
	for(j=1,p=1;p<n;j*=2,m=p){ 
		for(p=0,i=n-j;i<n;i++) y[p++]=i; 
		for(i=0;i<n;i++) if(sa[i]>=j) y[p++]=sa[i]-j; 
		for(i=0;i<n;i++) wv[i]=x[y[i]]; 
		for(i=0;i<m;i++) Ws[i]=0; 
		for(i=0;i<n;i++) Ws[wv[i]]++; 
		for(i=1;i<m;i++) Ws[i]+=Ws[i-1]; 
		for(i=n-1;i>=0;i--) sa[--Ws[wv[i]]]=y[i]; 
		for(t=x,x=y,y=t,p=1,x[sa[0]]=0,i=1;i<n;i++) 
			x[sa[i]]=cmp(y,sa[i-1],sa[i],j)?p-1:p++; 
	} 
	return; 
}
int sa[maxn],Rank[maxn],height[maxn];
//求height数组
void calheight(const char *r,int *sa,int n){
	int i,j,k=0;
	for(i=1;i<=n;i++) Rank[sa[i]]=i;
	for(i=0;i<n;height[Rank[i++]]=k)
		for(k?k--:0,j=sa[Rank[i]-1];r[i+k]==r[j+k];k++);
	return;
}
int slove(int k,int n){
	int mmax=0,mmin=inf,ans=0;
	for(int i=1;i<=n;i++){
		if(height[i]<k){
			if(mmax-mmin>=k)
				ans++;
			mmax=0;mmin=inf;
		}
		else{
			mmax=max(max(sa[i-1],sa[i]),mmax);
			mmin=min(min(sa[i-1],sa[i]),mmin);
		}
	}
	if(mmax-mmin>=k) ans++;
	return ans;
}
char str[maxn];
int main(){
	while(scanf("%s",str)!=EOF&&strcmp(str,"#")){
		int n=strlen(str);
		da(str,sa,n+1,130);
		calheight(str,sa,n);
		int ans=0;
		for(int i=1;i<=n/2;i++)
			ans+=slove(i,n);
		printf("%d\n",ans);
	}
	return 0;
}

参考:http://blog.csdn.net/acm_cxlove/article/details/7948285


  1. a是根先忽略掉,递归子树。剩下前缀bejkcfghid和后缀jkebfghicd,分拆的原则的是每个子树前缀和后缀的节点个数是一样的,根节点出现在前缀的第一个,后缀的最后一个。根节点b出现后缀的第四个位置,则第一部分为四个节点,前缀bejk,后缀jkeb,剩下的c出现在后缀的倒数第2个,就划分为cfghi和 fghic,第3部分就为c、c

  2. 第2题,TCP不支持多播,多播和广播仅应用于UDP。所以B选项是不对的。第2题,TCP不支持多播,多播和广播仅应用于UDP。所以B选项是不对的。

  3. 算法是程序的灵魂,算法分简单和复杂,如果不搞大数据类,程序员了解一下简单点的算法也是可以的,但是会算法的一定要会编程才行,程序员不一定要会算法,利于自己项目需要的可以简单了解。