2013
12-04

# String Matching

It’s easy to tell if two words are identical – just check the letters. But how do you tell if two words are almost identical? And how close is “almost”?There are lots of techniques for approximate word matching. One is to determine the best substring match, which is the number of common letters when the words are compared letter-byletter.

The key to this approach is that the words can overlap in any way. For example, consider the words CAPILLARY and MARSUPIAL. One way to compare them is to overlay them:

CAPILLARY
MARSUPIAL

There is only one common letter (A). Better is the following overlay:

CAPILLARY
MARSUPIAL

with two common letters (A and R), but the best is:

CAPILLARY
MARSUPIAL

Which has three common letters (P, I and L).

The approximation measure appx(word1, word2) for two words is given by:

common letters * 2
—————————–
length(word1) + length(word2)

Thus, for this example, appx(CAPILLARY, MARSUPIAL) = 6 / (9 + 9) = 1/3. Obviously, for any word W appx(W, W) = 1, which is a nice property, while words with no common letters have an appx value of 0.

The input for your program will be a series of words, two per line, until the end-of-file flag of -1.

Using the above technique, you are to calculate appx() for the pair of words on the line and print the result. For example:

CAR CART
TURKEY CHICKEN
MONEY POVERTY
ROUGH PESKY
A A
-1

The words will all be uppercase.

Print the value for appx() for each pair as a reduced fraction, like this:

appx(CAR,CART) = 6/7
appx(TURKEY,CHICKEN) = 4/13
appx(MONEY,POVERTY) = 1/3
appx(ROUGH,PESKY) = 0
appx(A,A) = 1

#define _CRT_SECURE_NO_WARNINGS
#include<iostream>
#include<cstdio>
#include<cstring>
#include<cmath>
using namespace std;

int gcd(int a,int b){
return a%b==0?b:gcd(b,a%b);
}

int main(){
char str1[1100],str2[1100];
while(scanf("%s",str1),strcmp(str1,"-1")){
scanf("%s",str2);
int len1=strlen(str1);
int len2=strlen(str2);
int MAX=0,len=0;
for(int i=0;i<len1;i++){
for(int j=0;j<len2;j++){
len=0;
for(int k1=i,k2=j;k1<len1&&k2<len2;k1++,k2++){
if(str1[k1]==str2[k2])len++;
}
if(len>MAX)MAX=len;
}
}
if(MAX==0)printf("appx(%s,%s) = 0\n",str1,str2);
else {
int len=len1+len2;
MAX*=2;
int x=gcd(MAX,len);
if(MAX==len){
printf("appx(%s,%s) = 1\n",str1,str2);
}else
printf("appx(%s,%s) = %d/%d\n",str1,str2,MAX/x,len/x);
}
}
return 0;
}

1. 约瑟夫也用说这么长……很成熟的一个问题了，分治的方法解起来o(n)就可以了，有兴趣可以看看具体数学的第一章，关于约瑟夫问题推导出了一系列的结论，很漂亮