Frequency Attacks
Using letter frequencies to attack monoalphabetic ciphers
İMike May, S. J., 2002, maymk@slu.edu
| > | restart: |
We have seen that one of the major weaknesses of a monoalphabetic substitution cipher stems from the fact that the distribution of letters is not random in English, or in any natural language. We would like to use that fact to help us attack ciphers.
| > |
Constructing a frequency counting tool
To mechanize this kind of attack, we want a tool that does frequency counts for us. The first step is to use a procedure that strips out the special characters and converts lower case to upper case. (We will leave the message as a string of numbers.
| > | specialCharacterStrip := proc(message) local messList, letter, messlength, messList2, i; messList := convert(message, bytes): messlength := linalg[vectdim](messList); messList2 := []; for i from 1 to messlength do letter := messList[i]: if (letter > 64) and (letter < 91) then messList2 := [op(messList2), letter] fi: if (letter > 96) and (letter < 123) then messList2 := [op(messList2), letter-32] fi: od: messList2: end: |
The next step is to count the number of occurrences of each character as well as the total number of characters producing a frequency vector.
| > | shortCounter := proc(messageList) local message2, letter, messlength, freqVector, i; messlength := linalg[vectdim](messageList); freqVector := linalg[vector](27, i->0): for i from 1 to messlength do letter:= messageList[i] - 64: freqVector[letter] := freqVector[letter]+1: freqVector[27] := freqVector[27]+1: od: freqVector: end: |
Finally we put these two procedures together into a single procedure for convenience. For convenience we would like a procedure to print the output in a nice form.
| > | freqCounter := message -> shortCounter(specialCharacterStrip(message)): freqPrinter := proc(freqVector) local message2, letter, messlength, freqcount, i, countera, temp0, temp1, temp2; temp1 := convert(freqVector[27],string): print(`The string had `||temp1|| ` characters from the Latin alphabet.`); for countera from 1 to 26 do temp0 := convert(convert([64 + countera], bytes),string): temp1 := convert(freqVector[countera],string): temp2 := convert(evalf( freqVector[countera]/freqVector[27]*100.0,4),string): print(`The letter `|| temp0 || ` appears `||temp1||` times,`|| temp2||`% of the time.`); od; end: |
Next we want to check out this routine with a reasonable string. Since we want to look at frequency counts as a way to attack codes, we want to look at a longer passage. The following is a paragraph from "Pi in the Sky." (Note that double double quotation marks are used to put a double quotation marks symbol inside a string.)
| > | message1 := "Numbers dominate much of our everyday lives in diverse and barely perceptible ways. Parents are anious to see their children attain good grades in mathematics, assured that this is a passport to success and security. The government stipulates that mathematical studies are compulsory for most of a child's time at school. Intelligence tests contain particular ingredients to test the candidates' ability to reason mathematically. Advertisements promising challenging and well-paid employment invite us to seek further details if we can successfully spot the next number in puzzling sequences. Students change subjects when they discover the mathematical content of their course is too difficult to master. Young children turn out to be extraordinarily talented in mathematics with abilities outstripping children twice their age, yet their performance in other subjects is unremarkable. Only in art and music is such precociousness so impressive. Looking out into the wider world, we see the cogs of the Western world turned by the engines of mathematical understanding. Computers programmed with inexorable logic control our economies and money markets, our aeroplanes and trains. Our own minds are seen as a fallible ""natural"" form of intelligence that these computers will ultimately perfect in an embodiment of ""artificial"" intelligence that is nothing more than a complicated mathematical algorithm that can be implemented electronically far faster and less fallibly than by our own feeble brains. In our universities there is an unspoken suspicion that the further a discipline lies from mathematics and the smaller the body of mathematical statements at its core the less rigorous and intellectually respectible it is.": freqVector1 := freqCounter(message1): print(freqVector1); freqPrinter(freqVector1); |
| > |
In this passage, the most common letters in order of frequency are e, t, i, a, s, n, r, o, and l, while we expect them to be e, t, a, o, i, n, s, h, and r in general.
Since we want to work with monoalphabetic ciphers, we recall the procedures needed to encode with these substitution ciphers.
| > | monoalph := proc(letter, rule, key) #This procedure changes one letter with a monoalphabetic cipher named rule local temp, temp2; temp := letter: if letter > 64 then if letter < 91 then temp := 65 + rule(temp - 65, key): fi:fi: if letter > 96 then if letter < 123 then temp := 97 + rule(temp - 97, key): fi:fi: temp: end: encodemonoalph := proc(message, rule, key) #This procedure uses monoalph to uncode a string local temp: #first convert the message to numerical equivalents temp[1] := convert(message, bytes): #then uses monalph to scrable the letters temp[2] := map(monoalph, temp[1], rule, key): #then convert back to the ASCII code convert(convert(temp[2], bytes), name): end: |
Attacking additive ciphers
| > | caesarrule := (letter, key) -> ((letter + key) mod 26): #This procedure is the rule used for the Caesar |
The method of an additive cipher is to translate x -> x + key for some key. A ciphertext created with an additive cipher is decrypted by using another additive cipher. To obtain the key it is enough to find a key that works for a single letter.
Consider the following ciphertext.
| > | secret1 := `Dolu fvb dlyl h ahkwvsl huk P dhz h mpzo Pu aol Whslvgvpj aptl, Huk zpkl if zpkl vu aol liipun apkl Dl zwyhdslk aoyvbno aol vvgl huk zsptl, Vy zrpaalylk dpao thuf h jhbkhs mspw aoyvbno aol klwaoz vm Jhtiyphu mlu Tf olhya dhz ypml dpao aol qvf vm spml, Mvy P svclk fvb lclu aolu.`; |
We start by doing a frequency count on the ciphertext
| > | freqVector1 := freqCounter(secret1): print(freqVector1); freqPrinter(freqVector1); |
From the frequency count, we see that L is the most common letter in the ciphertext. We will guess that the decryption should take L to E. Since L is the 12th letter and E is the 5th letter, we want -7 as the first guess for our key.
| > | encodemonoalph(secret1, caesarrule, -7); |
This gives an English plaintext from the poem Evolution by Langdon Smith.
A minor improvement we can make is to produce a procedure that calculates the key knowing an initial letter and its target.
| > | caesarkeyfinder := proc(twoletters) local nums: nums := convert(twoletters, bytes): (nums[2] - nums[1]) mod 26: end: |
In our case we wanted to take L to E.
| > | caesarkeyfinder(`le`); |
| > | encodemonoalph(secret1, caesarrule, caesarkeyfinder(`le`)); |
Exercises:
1) Enter a passage of at least 1000 characters from any page of any book you own. Do a frequency count on the letters in the passage. Does the count give any surprises?
| > |
2) Consider the ciphertext,
Vhfinmxkl ikhzktffxw pbma
bgxqhktuex ehzbv vhgmkhe hnk xvhghfbxl tgw fhgxr ftkdxml, hnk txkhietgxl
tgw mktbgl. Hnk hpg fbgwl tkx lxxg tl t yteebuex "gtmnkte" yhkf
hy bgmxeebzxgvx matm maxlx vhfinmxkl pbee nembftmxer ixkyxvm bg tg
xfuhwbfxgm hy "tkmbybvbte" bgmxeebzxgvx matm bl ghmabgz fhkx matg t
vhfiebvtmxw ftmaxftmbvte tezhkbmaf matm vtg ux bfiexfxgmxw
xexvmkhgbvteer ytk ytlmxk tgw exll yteebuer matg ur hnk hpg yxxuex uktbgl.
Use frequency counts to find the key to the additive cipher needed to decode. Give the plaintext.
| > |
We would like to automate the attack and also reduce errors in guessing the key by noting that rather than just looking for the most frequent letter we can look for the pattern of frequent letters in the alphabet. We do this by taking a dot product of the frequency vector with the vector giving the frequencies of letter usage in typical English text. For technical reasons we will repeat that vector twice.
| > | engFreq := [.082, .015, .028, .043, .127, .022, .020, .061, .070, .002, .008, .040, .024, .067, .075, .019, .001, .060, .063, .091, .028, .010, .023, .001, .020, .001]: engFreq2 := [op(engFreq), op(engFreq)]; |
Now we take a dot product with all possible shifts.
| > | for i from 0 to 25 do print(sum(engFreq2[i+j]*freqVector1[j]/freqVector1[27], j=1..26), "with key of", i) od; |
Note that the dot product shifted by the decryption key is almost .065, while the other shifts are all less than .048. We can automate this a bit more by only printing out the result when the dot product is at least .055.
| > | for i from 0 to 25 do dotProd := sum(engFreq2[i+j]*freqVector1[j]/freqVector1[27], j=1..26): if (dotProd > .06) then print(dotProd, ` with decrypt key of`, i) fi: od: |
For convenience we reduce this to a single procedure.
| > | keyFinder := proc(freqVect) local engFreqVect, i, dotProd: engFreqVect := [.082, .015, .028, .043, .127, .022, .020, .061, .070, .002, .008, .040, .024, .067, .075, .019, .001, .060, .063, .091, .028, .010, .023, .001, .020, .001, .082, .015, .028, .043, .127, .022, .020, .061, .070, .002, .008, .040, .024, .067, .075, .019, .001, .060, .063, .091, .028, .010, .023, .001, .020, .001]: for i from 0 to 25 do dotProd := sum(engFreqVect[i+j]*freqVect[j]/freqVect[27], j=1..26): if (dotProd > .06) then print(dotProd, ` with decrypt key of`, i) fi: od: end: |
| > | keyFinder(freqCounter(secret1)); |
| > |
Exercise:
2A) repeat exercise 2 with an automated attack.
| > |
| > |
Attacking multiplicative ciphers
Multiplicative ciphers are much like additive ciphers with a differing encryption rule
| > | multrule := (letter, key) -> ((letter * key) mod 26): #This procedure is the rule used for multiplicative ciphers |
The attack in the multiplicative case is much like the attack in the additive case. We start the attack by using a frequency count to identify a common letter, X, in ciphertext and what we think it should be, Y, in plaintext. We then reason that we want to find the key such that Y = key*X. The obvious thing to try is to let the key be Y/X. The problem is that in
, X need not be invertible, so we may not be able to do the division. The solution is to simply check all the keys to see which one works.
| > | multkeyfinder := proc(twoletters) # The input is assumed to be a sequence of two lower case letters. local counta, nums, key: key := 1: for counta from 1 to 25 do if igcd(counta, 26) = 1 then nums := convert(twoletters, bytes): if (counta*(nums[1] - 97) - (nums[2] - 97)) mod 26 = 0 then key := counta: break: fi; fi; od; if key = 1 and (nums[1] <> nums[2])then print(`No key will work`): fi: key: end: |
| > | multkeyfinder("mm"); |
| > | multkeyfinder(`mz`); |
| > | multkeyfinder(`bf`); |
There are a few technical details of this procedure worth commenting on.
1) The procedure considers a multiplicative cipher with the understanding that A is treated as 0, B treated as 1 and so forth. Recall that convert takes `a` to 97 and `A` to 65.
2) The procedure assume then that the two letter sequence it uses as input is lower case.
3) Since we are planning to use the procedure to produce a key when we think we know a letter and its target. Thus we not only need to understand that we may have guessed wrong. We also need to consider that there may be no key that takes our favorite letter to its intended target.
4) The procedure will only give answers that are invertible in Z[26].
Exercise:
3) Consider the ciphertext
`Kn ymf mnkxsfqkbksq bzsfs kq an mnqjygsn qmqjkwkyn bzab bzs dmfbzsf
a hkqwkjrkns rksq dfyc cabzscabkwq anh bzs qcarrsf bzs lyhe yd
cabzscabkwar qbabscsnbq ab kbq wyfs bzs rsqq fkoyfymq anh knbsrrswbmarre\nfsqjswbklrs kb kq.`
It was produced with a multiplicative cipher. Find the decryption key and decipher the message.
| > |
Attacking affine ciphers
| > | affinerule := (letter, key) -> ((letter*key[1] + key[2]) mod 26): |
Affine ciphers are a combination of additive and multiplicative ciphers. Our key becomes an ordered pair [A,B] with the letter X being taken to AX + B. Once again we decrypt by using the affine cipher with the appropriate key. A difference from the earlier rules is that an affine cipher is a linear function, and it is not determined by a single point. Instead we need two pairs of letters to determine the key for an affine cipher.
Thus the process of finding the key reduces to the following college algebra question:
If you are told that Y1 = A*X1 +B and Y2 = A*X2 + B, can you recover A and B from X1, X2, Y1, and Y2?
To solve the problem we note that (Y1 - Y2) = A*(X1 - X2) and that once we have A, that B = Y1 - A*X1. We should note that if X1 - X2 divides 26, we may have more than one possible key. Since A must be invertible, we only have to be concerned when (X1 - X2) is 13.
| > | affinekeyfinder := proc(fourletters) # Input `abcd` designates we want a -> b, c -> d. # inputs all assumed lower case. local counta, nums, key, multanswers: key := [1,0]: nums := convert(fourletters, bytes): if igcd(nums[1] - nums[3], 26) = 13 then multanswers := 1: print(`warning, multiple answers are possible`): else multanswers := 0 fi: for counta from 1 to 25 do if igcd(counta, 26) = 1 then if (counta*(nums[1]-nums[3]) - (nums[2]-nums[4])) mod 26 = 0 then key := [counta, nums[2]-96 - counta*(nums[1]-96) mod 26]: if multanswers = 1 then print(key) fi; # break; fi; fi; od; if key = [1,0] then print(`No key works`): fi: key; end: |
Check the code on some simple cases.
| > | affinekeyfinder(`abno`); |
| > | affinekeyfinder(`abce`); |
| > | affinekeyfinder(`accg`); |
It is also useful to have a procedure that finds inverse keys for the affine rule.
| > | invertaffinekey := proc(key) print([1/key[1] mod 26, -key[2]/key[1] mod 26]): end: |
Exercises:
4) It was noted above that (Y1 - Y2) = A*(X1 - X2) can have multiple keys when (X1 - X2) divides 26. Explain how many keys we can get when this difference is even and when the difference is 13. Explain why we can still only get one key if the difference is even.
| > |
5) Consider the following ciphertext encrypted with an affine cipher:
"Zc nbu bczmhufzqzhf qohuh zf pc bcfynvhc fbfyzlznc qopq qoh sbuqohu
p wzflzygzch gzhf sunr rpqohrpqzlf pcw qoh frpgghu qoh anwt ns
rpqohrpqzlpg fqpqhrhcqf pq zqf lnuh qoh ghff uzdnunbf pcw zcqhgghlqbpggt
uhfyhlqzagh zq zf."
Find the key and give the plaintext
| > |
6) Enter a message of approximately 10 lines. Use one of the ciphers above to encrypt it. Post the ciphertext and the method, but not the key, to the bulletin board.
| > |
7) Crack someone else's post. Come up with an appropriate reply. Post your reply as a response to the message. Your message should be encrypted by the same method as the original. Give the key to your response in your posting. (Be careful to keep the encrypting and decrypting keys distinct.
| > |