CSC 161 Program #4"Etaoin Shrdlu"That's pronounced "eh-tay-oh-in shird-loo". ETAOINSHRDLU... these are the 12 most commonly used letters in the English language. 'E' is used most, 'T' second, and so on. In Program #4, we'll explore a simple brand of encryption (letter substitution) and decryption (letter frequency analysis). If that weren't bad enough, this is your group project.
|
LogisticsThis program is worth 12 points, or 12% of your final grade. The pertinent time coordinates:
Some of the fun, new C++ stuff we'll cover includes:
|
||||||||
DescriptionWhat does "fokpz... zpv, cjmm" mean? First some terms... Encryption or encoding involves taking some text and converting it into some code, so that the original text is "difficult" to understand or interpret. Decryption or decoding is the opposite process of decryption. It takes an encrypted text and, by some means, deduces the original message. Character substitution is a form of encryption that encodes text by uniformly substituting one letter for another. For example, I could encrypt "enjoy... yow, bill" into "fokpz... zpv, cjmm" by shifting each character one place... B=>C, i=>j, l=>m, and so on. This subset of character substitution is called shifting or a Caesar code. We will use more general character substitution, however, not shifting. Letter frequency decryption is a method of decoding text encrypted by character substitution. The frequency of letters in the English language has been studied extensively. An easy example: the letter 'e' occurs in English text far more often than 'z'. A table of letter frequencies can be used to decode text encrypted using character substitution. In short, the frequencies of letters in an encrypted message (even with the letters jumbled) should approximate those in regular English text. We will research this topic more in class. Here's a nice wikipedia link on letter frequency decryption: http://en.wikipedia.org/wiki/Letter_frequencies |
||||||||
Three jobsWe will work in teams of three. Oddly, there are three jobs to be done.
FYI: I dislike these class names... old age, writer's block, whatever. Please feel free to do better. All team members will participate in researching decryption techniques, gathering text sources to encrypt/decrypt, designing these CharMap and FreqList classes, and designing the program in general. Classes and code may be designed as a group, but please keep the actually coding boundaries intact... CharMap guy does the CharMap class, etc. ITS is setting up shared k: drive folders for each team. I will provide class/lab time for team meetings, but project leaders should immediately harvest team emails and such to further communication inside each group. The CharMap Guy/ClassThe CharMap class is used to do character substitution. This includes:
My CharMap files look like this:
I use the %CharMap% as a magic number at the top of the file to identify it as a CharMap file. It's an old Unix thing. Also, here's an example of a nice CharMap file: start_map.txt The FreqList Guy/ClassThe FreqList class is used to track the frequency of letters in text, including:
FreqList files look like this:
Here's an example of a FreqList file: start_freqs.txt The Project Leader: Encryption/DecryptionThe job of the project leader will be to use the CharMap and FreqList classes to implement actual encryption and decryption algorithms. Please see Figure 1 attached which describes the encryption and decryption processes. The encryption process:
The decryption process:
Decryption will probably not be a 100% automatic process. Your character frequencies, more often than not, will not exactly match whatever CharMap standard you are using.
|
||||||||
Testing and the final stageAt the end of the programming phase, you must provide:
You'll notice that there are 5 days scheduled for "decryption" after the actual program is due. I will provide a couple of "difficult" decryption examples, and you can work on them as a team. Recall, this letter frequency stuff is not completely automatic, and you may have to manually massage or, even better, add some capabilities to your program to decrypt these last examples. We will discuss this more in class. |
||||||||
Design RequirementsYou should:
|
||||||||
GradingShort program this time:
thanks... yow, bill |
||||||||
LinksSome links that I enjoyed:
Just FYI, some of the google searches that I did included:
|
||||||||
NotesMay 5, 2005 You may feel the need to rewind an input file, or re-read it from the beginning. Check out page 850 of our text and the seekg() member function of fstream. Note: the example on page 850 should read "dataIn.seekg...", not "dataIn.seek". Get it? Also, did you guys cover sorting already in 160? If not, let me know. These are very small sorts (26 letter), so old, slow, simple bubble sort is fine. It's on page 583. May 10, 2005 A blurb that I wrote on the research phase of this project: prog04_freq_research.html yow, bill May 12, 2005 I have added a function to the k: drive... common_area/prog04_freqs/comment.cpp If you install this function in your CharMap and/or FreqList classes, then you will be able to read files that have comment lines beginning with a '#'. I have comments in some of my example FreqList files. thanks... yow, bill May 17, 2005 We'll talk about all this at lecture tonight:
good luck... yow, bill |