首页 > ACM题库 > UVA > uva-1351-String Compression[区间dp]
2014
01-20

uva-1351-String Compression[区间dp]

 String Compression

Run Length Encoding(RLE) is a simple form of compression. RLE consists of the process for searching for a repeated runs of a single character in a string to be compressed, and replacing them by a single instance of the character and a run count. For example, a string abcccddddddefgggggggggghijk is encoded into a stringab3c6def10ghijk by RLE.

A new compression method similar to RLE is devised and the rule of the method is as follows: if a substringS <tex2html_verbatim_mark>is repeated k <tex2html_verbatim_mark>times, replace k <tex2html_verbatim_mark>copies of S <tex2html_verbatim_mark>by k(S) <tex2html_verbatim_mark>. For example, letsgogogo is compressed into lets3(go). The length of letsgogogo is 10, and the length of lets3(go) is 9. In general, the length of k(S) <tex2html_verbatim_mark>is (number of digits in k <tex2html_verbatim_mark>) + (length of S <tex2html_verbatim_mark>) + 2 (for `(‘ and `)’). For example, the length of 123(abc) is 8. It is also possible to nest compression, so the substring S <tex2html_verbatim_mark>may itself be a compressed string. For example,nowletsgogogoletsgogogo could be compressed as a now2(lets3(go)), and nowletsgogogoletsgogogoandrunrunrun could be compressed as now2(lets3(go))and3(run).

Write a program that, for a given string, gives a shortest compressed string using the compression rules as described above.

Input 

Your program is to read from standard input. The input consists of T <tex2html_verbatim_mark>test cases. The number of test cases T<tex2html_verbatim_mark>is given in the first line of the input. Each test case consists of a single line containing one string of no more than 200 characters drawn from a lower case alphabet. The length of shortest input string is 1.

Output 

Your program is to write to standard output. Print exactly one line for each test case. For each test case, print the length of the shortest compressed string.

The following shows sample input and output for four test cases.

Sample Input 

4 
ababcd 
letsgogogo 
nowletsgogogoletsgogogo 
nowletsgogogoletsgogogoandrunrunrun

Sample Output 

6 
9 
15 
24

题目大意

给一个字符串,可以把连续相同的部分进行缩写成k(S)的形式,S是一个字符串,k表示有连续相同的S

例如,abgogogogo,可以缩写成ab4(go). 还可以嵌套缩写,比如

“nowletsgogogoletsgogogo”, 缩写成“now2(lets3(go))”

思路

一道区间dp,但是这题并不好想
f(i, j)表示字符串的i~j位的最小位数
那么
f(i, j) = min{
min{ f(i,k)+f(k+1, j), i<=k<j },
min{ digitNum(k)+f[l][l+k-1]+2, 如果字符串可以由前k个字符串重复组成的 }
}
digitNum(k)表示数字k的位数
判断区间(i, j)是否有由连续k个组成的字符串连续组成的,直接用O(n)的时间判断

/**==========================================
 *   This is a solution for ACM/ICPC problem
 *
 *   @source:uva-1351 String Compression
 *   @type:  区间dp
 *   @author: shuangde
 *   @blog: blog.csdn.net/shuangde800
 *   @email: [email protected]
 *===========================================*/
#include<iostream>
#include<cstdio>
#include<algorithm>
#include<vector>
#include<queue>
#include<cmath>
#include<cstring>
using namespace std;

typedef long long int64;
const int INF = 0x3f3f3f3f;
const double PI  = acos(-1.0);

const int MAXN = 210;
char str[MAXN];
int f[MAXN][MAXN], len;

// 判断字符串区间[l,r]之否由[l,l+k-1]字串连续组成的
bool check(int l, int r, int k){
    int len = r-l+1;
    int i = 0;
    while(i < k){
        for(int p=1; l+p*k+i<=r; ++p){
            if(str[l+i] != str[l+p*k+i])
                return false;
        }
        ++i;
    }
    return true;
}

// 计算数字x有几位
inline int digitNum(int x){
    int cnt = 0;
    while(x > 0){ ++cnt; x /= 10; }
    return cnt;
}

int main(){

    int nCase;
    scanf("%d", &nCase);

    while(nCase--){

        scanf("%s", str+1);
        len = strlen(str+1);

        for(int i = 1; i <= len; ++i)
            f[i][i] = 1;

        for(int d = 2; d <= len; ++d) {
            for(int l = 1; l+d-1 <= len; ++l) {

                int r = l + d - 1;

                f[l][r] = INF;
                int& ret = f[l][r];

                for(int k=l; k<r; ++k)
                    ret = min(ret, f[l][k]+f[k+1][r]); 

                for(int k = 1; k <= d/2; ++k) {
                    if(d%k!=0) continue;
                    if(check(l, r, k)){
                        ret = min(ret, digitNum(d/k) + f[l][l+k-1] + 2); 
                    }
                }
            }
        }
        printf("%d\n", f[1][len]);
    }
    return 0;
}

 


  1. 在方法1里面:

    //遍历所有的边,计算入度
    for(int i=0; i<V; i++)
    {
    degree = 0;
    for (j = adj .begin(); j != adj .end(); ++j)
    {
    degree[*j]++;
    }
    }

    为什么每遍历一条链表,要首先将每个链表头的顶点的入度置为0呢?
    比如顶点5,若在顶点1、2、3、4的链表中出现过顶点5,那么要增加顶点5的入度,但是在遍历顶点5的链表时,又将顶点5的入度置为0了,那之前的从顶点1234到顶点5的边不是都没了吗?