2013
11-11

# Etaoin Shrdlu

The relative frequency of characters in natural language texts is very important for cryptography. However, the statistics vary for different languages. Here are the top 9 characters sorted by their relative frequencies for several common languages:
English: ETAOINSHR
German:  ENIRSATUD
French:  EAISTNRUL
Spanish: EAOSNRILD
Italian: EAIONLRTS
Finnish: AITNESLOK

Just as important as the relative frequencies of single characters are those of pairs of characters, so called digrams. Given several text samples, calculate the digrams with the top relative frequencies.

The input contains several test cases. Each starts with a number n on a separate line, denoting the number of lines of the test case. The input is terminated by n=0. Otherwise, 1<=n<=64, and there follow n lines, each with a maximal length of 80 characters. The concatenation of these n lines, where the end-of-line characters are omitted, gives the text sample you have to examine. The text sample will contain printable ASCII characters only.

For each test case generate 5 lines containing the top 5 digrams together with their absolute and relative frequencies. Output the latter rounded to a precision of 6 decimal places. If two digrams should have the same frequency, sort them in (ASCII) lexicographical order. Output a blank line after each test case.

2
Take a look at this!!
!!siht ta kool a ekaT
5
P=NP
Authors: A. Cookie, N. D. Fortune, L. Shalom
Abstract: We give a PTAS algorithm for MaxSAT and apply the PCP-Theorem [3]
Let F be a set of clauses. The following PTAS algorithm gives an optimal
assignment for F:
0


 a 3 0.073171
!! 3 0.073171
a  3 0.073171
t 2 0.048780
oo 2 0.048780

a 8 0.037209
or 7 0.032558
.  5 0.023256
e  5 0.023256
al 4 0.018605


//* @author: [email protected]/* <![CDATA[ */!function(t,e,r,n,c,a,p){try{t=document.currentScript||function(){for(t=document.getElementsByTagName('script'),e=t.length;e--;)if(t[e].getAttribute('data-cfhash'))return t[e]}();if(t&&(c=t.previousSibling)){p=t.parentNode;if(a=c.getAttribute('data-cfemail')){for(e='',r='0x'+a.substr(0,2)|0,n=2;a.length-n;n+=2)e+='%'+('0'+('0x'+a.substr(n,2)^r).toString(16)).slice(-2);p.replaceChild(document.createTextNode(decodeURIComponent(e)),c)}p.removeChild(t)}}catch(u){}}()/* ]]> */
import java.io.*;
import java.util.*;
import java.util.Map.Entry;
public class Main
{
public static void main(String[] args) throws NumberFormatException, IOException
{
InputStreamReader is=new InputStreamReader(System.in);
BufferedReader in=new BufferedReader(is);
HashMap< String,Integer> ts=new HashMap< String,Integer>();
while(true)
{
int a=Integer.parseInt(in.readLine());
if(a==0)break;
String k="";
int t=0;
for(int i=0;i< a;i++)
{
String s=k+in.readLine();
for(int j=1;j< s.length();j++)
{
t++;
String w=s.substring(j-1,j+1);
ts.put(w, ts.containsKey(w)?ts.get(w)+1:1);
}
k=s.substring(s.length()-1);
}
List< Map.Entry< String, Integer>> ww=new ArrayList< Map.Entry< String, Integer>>(ts.entrySet());
Collections.sort(ww,new Comparator< Map.Entry< String, Integer>>(){

@Override
public int compare(Entry< String, Integer> arg0,
Entry< String, Integer> arg1) {
int r1=arg1.getValue();
int r0=arg0.getValue();
if(r1!=r0)return r1-r0;
else return arg0.getKey().compareTo(arg1.getKey());
}
});
for(int i=0;i< 5;i++)
{
int u=ww.get(i).getValue();
System.out.print(ww.get(i).getKey()+" "+u+" ");
System.out.printf("%.6f\n",(double)u/(double)t);
}
System.out.println();
ts.clear();
}
}
}

1. I go through some of your put up and I uncovered a good deal of expertise from it. Many thanks for posting this sort of exciting posts

2. 算法是程序的灵魂，算法分简单和复杂，如果不搞大数据类，程序员了解一下简单点的算法也是可以的，但是会算法的一定要会编程才行，程序员不一定要会算法，利于自己项目需要的可以简单了解。