首页 > ACM题库 > HDU-杭电 > HDU 4668-Finding string-字符串-[解题报告]HOJ
2015
09-17

HDU 4668-Finding string-字符串-[解题报告]HOJ

Finding string

问题描述 :

Richard is a smart data analyst in a famous company. One of his daily boring work is to find how many times does a pattern string occur in a very long string.
Luckily, Richard notices that there are many consecutive repeated substrings in this very long string, so he uses the following compression algorithm to make the string shorter:

a). Find a non-compressed consecutive repeated substring of the original string, e.g. “ab” in “cabababd”.
b). Replace the repeating part with the bracketed repetend, followed by the times the repetend appears in the original string. e.g. Write “cabababd” as “c[ab]3d”. Note she can also write it as “c[ab]1ababd” or “ca[ba]2bd” and so on, although these string are not compressed as well as the first one is.
c). Repeat a) and b) several times until the string is short enough.

However, Richard finds it still a bit hard for him to do his work after the compression. So he orders you to write a program to make his work easier. Can you help him?

输入:

The input contains several test cases, terminated by EOF. The number of test cases does not exceed 10000.
Each test case contains two lines, denote the compressed text and pattern string. The decompressed text and pattern string contain lowercase letter only. The first line contains only lowercase letters (a-z), square brackets ([]) and numbers (0-9), and the second line contains only lowercase letters (a-z). The brackets must be followed with an integer t(1≤t≤231-1)indicating the times the string in the brackets repeat. The brackets won’t be nested.
You can assume the length of the compressed string and pattern string won’t exceed 500.
Please note that for most test cases, the length of the pattern string is relatively very small.

输出:

The input contains several test cases, terminated by EOF. The number of test cases does not exceed 10000.
Each test case contains two lines, denote the compressed text and pattern string. The decompressed text and pattern string contain lowercase letter only. The first line contains only lowercase letters (a-z), square brackets ([]) and numbers (0-9), and the second line contains only lowercase letters (a-z). The brackets must be followed with an integer t(1≤t≤231-1)indicating the times the string in the brackets repeat. The brackets won’t be nested.
You can assume the length of the compressed string and pattern string won’t exceed 500.
Please note that for most test cases, the length of the pattern string is relatively very small.

样例输入:

[ab]10aaba
ba

样例输出:

11

转载请注明出处,谢谢http://blog.csdn.net/ACM_cxlove?viewmode=contents    by—cxlove

题意:给出一个压缩后的串,以及一个模式串,问模式串出现了多少次。

http://acm.hdu.edu.cn/showproblem.php?pid=4668

这种压缩形式的话,在去年金华邀请赛中出现过,但是那题的范围不大。

直接展开作多串匹配,暴力AC自动机就行。

但是这题的原串不大,但是展开后会非常大。

可以发现压缩串把原串分为一个个区间,那么我们可以分两步统计。

预处理的话需要将压缩串解析成一个个的区间,在这里为了方便后面的匹配,我们假设模式串的长度为P

我在解析的时候,如果相邻两个区间的字符串长度都小于p的话,会将其合并,作用在后面说。

对于每一个区间,查询匹配次数。

1、如果这个区间非压缩的,直接KMP

2、如果这个区间是压缩后的,肯定不能展开暴力KMP。我们只需要统计匹配的起始位置在第一个循环节内的,因为我们在第一个循环节后面添加上p – 1个字符。这样保证了匹配位置在第一个循环节内,然后 便是统计。

这里有点麻烦的是,[ab]10中查询ab的话,我们应该 是aba中查询ab,发现出现了1次,那么最终结果应该统计为多少呢,按理说,aba这个串占用了两个循环节,那么总共有9个aba,应该 是9次,但是显然这里应该 是10次。那么对于[ab]10里面查询ba的话,同样还是把aba拿出来匹配,出现了1次,那么这里应该是9次。看似类似的情况,统计结果却不一样,因为第一次匹配只占用本身这个循环节,而第二次占用了所有循环节。

那么我们在后面添加 p – 1个字符之后,同样是处理kmp,同样是考虑匹配位置占用了所有的循环节统计,然后 再添加上不占用最后一个循环节次数。如[ab]10查询ab,aba中出现一次ab,而[ab]10中有9个aba,所以出现9次,然后再统计ab中有多少个ab,答案为1,所以最终为10。
 注意有些地方的表述,最好用一个较长的串再模拟一下,如[ab]10和[ba]10中查询abab这个串的情况。

第一个问题就算解决 了

第二个问题是跨区间的串,即模式串出现在两个或者两个以上的区间中。

既然如此,要保证这个匹配串要横跨到下一个区间,那么当前区间取一个长度为p – 1的后缀,那么匹配的话肯定会到下一个区间,下一个区间的话就取一个p – 1的前缀,那么保证匹配的串会出现在第一个区间中。

虽然我在前面解析的过程中,保证相邻两个区间长度小于p的话,会合并。

但是有一种情况是[ab]1000000cd[ab]10000000。那么对于中间的区间长度还是小于p。

所以就有可能匹配串横跨三个区间,这里需要特判一下,即第一个区间不超过p – 1,第二个区间全取,第三个区间加上第二个区间的长度要不超过p – 1。

总之就是这么麻烦。。。。应该是我实现得太麻烦。。。

不过唯一的好处便是范围不大,我是用String各种乱搞的。。。

#include <iostream>
#include <cstdio>
#include <cstring>
#include <map>
#include <vector>
#include <string>
#include <queue>
#include <cmath>
#include <algorithm>
#define lson step << 1
#define rson step << 1 | 1
#pragma comment(linker,"/STACK:102400000,102400000")
using namespace std;
typedef long long LL;
const int N = 5005;
struct Node {
    string s;
    int cnt;
    Node () {}
    Node (string _s , int c) :s(_s) , cnt(c) {}
    string cat () {
        string t = s;
        for (int i = 1 ; i < cnt ; i ++)
            s = s + t;
        cnt = 1;
        return s;
    }
    LL len () {
        return (LL)s.size() * cnt;
    }
    //
    string prefix (int l) {
        string str = s;
        for (int i = 1 ; i < cnt && str.size() < l ; i ++) {
            str += s;
        }
        return str.substr (0 , l);
    }
    string suffix (int l) {
        string str = s;
        for (int i = 1 ; i < cnt && str.size() < l ; i ++) {
            str += s;
        }
        return str.substr (str.size() - l , l);
    }
}a[N];
char str[N] , pat[N];
int next[N] , idx , l , p;
void get_next (char *s , int l) {
    next[0] = -1;
    int i = 0 , j = -1;
    while (i < l) {
        if (j == -1 || s[i] == s[j]) {
            i ++; j ++;
            next[i] = j;
        }
        else j = next[j];
    }
}
void gao (string s , int tot) {
    if (s == "") return ;
    if (idx == 0 || s.size() * tot >= p || a[idx - 1].len() >= p) {
        a[idx ++] = Node (s , tot);
    }
    else {
        a[idx - 1].cat ();
        a[idx - 1].s += Node (s , tot).cat();
    }
} 
int match (string s , char *t , int p) {
    int l = s.size() ;
    int i = 0 , j = 0 , ans = 0;
    while (i < s.size()) {
        if (j == - 1 || s[i] == t[j]) {
            i ++; j ++;
            if (j == p) {
                ans ++;
                j = next[j];
            }
        }
        else j = next[j];
    }
    return ans;
} 
int main () {
    #ifndef ONLINE_JUDGE
        freopen ("input.txt" , "r" , stdin);
        freopen ("output.txt" , "w" , stdout);
    #endif
    while (scanf ("%s %s" , str , pat) != EOF) {
        idx = 0;
        l = strlen (str);p = strlen (pat);
        get_next (pat , p);
        string s = "";
        int tot = 1;
        for (int i = 0 ; i < l ; i ++) {
            if (str[i] == '[') {
                if (s == "") continue;
                gao (s , tot);
                s = ""; tot = 1;
            }
            else if (str[i] == ']') {
                tot = 0;
                i ++;
                while (isdigit(str[i]))
                    tot = tot * 10 + str[i ++] - '0';
                i --;
                gao (s , tot);
                s = ""; tot = 1;
            }
            else s += str[i];
        }
        gao (s , tot);
        s = ""; tot = 1;
        LL ans = 0;
        // for (int i = 0 ; i < idx ; i ++) {
        //     cout << a[i].s << " " << a[i].cnt << endl; 
        // }
        for (int i = 0 ; i < idx ; i ++) {
            if (a[i].len() < p) continue;
            if (a[i].cnt == 1) ans += match (a[i].s , pat , p);
            else {
                int use = min(a[i].cnt , 1 + (p - 1 + (int)a[i].s.size() - 1) / (int)a[i].s.size());
                string s = "";
                for (int j = 1 ; j < use ; j ++) {
                    s += a[i].s;
                }
                s = a[i].s + s.substr (0 , min ((int)s.size() , p - 1));
                int tmp = match (s , pat , p);
                ans += (LL)tmp * (a[i].cnt - use + 1);
                if (p) {
                    s = "";
                    for (int j = 1 ; j < use ; j ++)
                        s += a[i].s;
                    ans += match (s , pat , p);
                }
            }
        }
        for (int i = 0 ; i < idx - 1 ; i ++) {
            s = a[i].suffix (min (a[i].len () , p - 1LL));
            if (a[i + 1].len () < p - 1) {
                s += a[i + 1].cat ();
                if (i + 2 < idx) {
                    s += a[i + 2].prefix (min (a[i + 2].len () , p - 1 - a[i + 2].len ()));
                }
            }
            else {
                s += a[i + 1].prefix (min (a[i + 1].len () , p - 1LL));
            }
            ans += match (s , pat , p);
        }
        printf ("%I64d\n" , ans);
    }
    return 0;
}




版权声明:本文为博主原创文章,未经博主允许不得转载。

参考:http://blog.csdn.net/acm_cxlove/article/details/10552857