CSC 161 Program #4

"Etaoin Shrdlu"

That's pronounced "eh-tay-oh-in shird-loo". ETAOINSHRDLU... these are the 12 most commonly used letters in the English language. 'E' is used most, 'T' second, and so on.

In Program #4, we'll explore a simple brand of encryption (letter substitution) and decryption (letter frequency analysis). If that weren't bad enough, this is your group project.

Logistics

This program is worth 12 points, or 12% of your final grade. The pertinent time coordinates:

Assigned: Thurs May 5, 2005
Program due: Thurs May 19, 2005
Decryption due: Tues May 24, 2005

Some of the fun, new C++ stuff we'll cover includes:

Programming in a team... the agony, the ecstasy
Exceptions
Using your TextMenu class from Program #3

Description

What does "fokpz... zpv, cjmm" mean? First some terms...

Encryption or encoding involves taking some text and converting it into some code, so that the original text is "difficult" to understand or interpret.

Decryption or decoding is the opposite process of decryption. It takes an encrypted text and, by some means, deduces the original message.

Character substitution is a form of encryption that encodes text by uniformly substituting one letter for another. For example, I could encrypt "enjoy... yow, bill" into "fokpz... zpv, cjmm" by shifting each character one place... B=>C, i=>j, l=>m, and so on. This subset of character substitution is called shifting or a Caesar code. We will use more general character substitution, however, not shifting.

Letter frequency decryption is a method of decoding text encrypted by character substitution. The frequency of letters in the English language has been studied extensively. An easy example: the letter 'e' occurs in English text far more often than 'z'. A table of letter frequencies can be used to decode text encrypted using character substitution. In short, the frequencies of letters in an encrypted message (even with the letters jumbled) should approximate those in regular English text. We will research this topic more in class.

Here's a nice wikipedia link on letter frequency decryption:

http://en.wikipedia.org/wiki/Letter_frequencies

and another on using frequencies for decryption:

http://en.wikipedia.org/wiki/Frequency_analysis

Three jobs

We will work in teams of three. Oddly, there are three jobs to be done.

The project leader will have many hats: coordinate the team communication, research and design efforts, build main() using the software of other team members, and report progress back to the (completely out of touch technically) VP in charge... me. I'll need those TPS reports every Monday morning.
The CharMap guy will design, code and test the CharMap class.
The FreqList guy will design, code and test the FreqList class.

FYI: I dislike these class names... old age, writer's block, whatever. Please feel free to do better.

All team members will participate in researching decryption techniques, gathering text sources to encrypt/decrypt, designing these CharMap and FreqList classes, and designing the program in general. Classes and code may be designed as a group, but please keep the actually coding boundaries intact... CharMap guy does the CharMap class, etc.

ITS is setting up shared k: drive folders for each team. I will provide class/lab time for team meetings, but project leaders should immediately harvest team emails and such to further communication inside each group.

The CharMap Guy/Class

The CharMap class is used to do character substitution. This includes:

map a char... 'y' => 'g', for example
create a random mapping of each char
encode a char, using a mapping
decode a char, reversing the mapping
encrypt a file by encoding each char in a file
decrypt a file by decoding each char in a file
read/write a CharMap object from/to a file

My CharMap files look like this:

%CharMap%

...

I use the %CharMap% as a magic number at the top of the file to identify it as a CharMap file. It's an old Unix thing. Also, here's an example of a nice CharMap file: start_map.txt

The FreqList Guy/Class

The FreqList class is used to track the frequency of letters in text, including:

get/set the count and/or frequency of a char
increment the count of a char
count all the occurrences of a char in a file
sort the FreqList data by char count
get a char from this sorted order... the first char being the most common, the second being second most common and so on
read/write FreqList files from/to a file

FreqList files look like this:

%FreqList%

...

Here's an example of a FreqList file: start_freqs.txt

The Project Leader: Encryption/Decryption

The job of the project leader will be to use the CharMap and FreqList classes to implement actual encryption and decryption algorithms. Please see Figure 1 attached which describes the encryption and decryption processes.

The encryption process:

Read in a text file
Either read a CharMap file or generate a new, random one
Use the CharMap to encode each char in the input file
Write these new encoded chars to an output encrypted file
You probably want to save the CharMap as well, if you generated it.

The decryption process:

Read in an encrypted file
Create a new FreqList and count the occurrences of chars in the encrypted file
Read a FreqList file (or two) of representative character frequencies for your decryption... more on this in a second.
Create a CharMap using the two FreqLists: the one for the encrypted file, and the one with the representative sample.
Apply the CharMap to decode each letter in the encrypted file
Write these decoded chars to an output file. Again, you'll probably want to save that CharMap and maybe the FreqList too.

Decryption will probably not be a 100% automatic process. Your character frequencies, more often than not, will not exactly match whatever CharMap standard you are using.

Testing and the final stage

At the end of the programming phase, you must provide:

Test cases exercising your CharMap class
Test cases exercising your FreqList class
At least 3 simple decryption examples where you use the exact FreqList created for the file you encrypt. This should help show that your decryption/encryption cycle is working.
At least 3 "real" decryption examples... where you use a FreqList file for the English language or some such thing.
I'll provide a couple of files to decrypt.

You'll notice that there are 5 days scheduled for "decryption" after the actual program is due. I will provide a couple of "difficult" decryption examples, and you can work on them as a team. Recall, this letter frequency stuff is not completely automatic, and you may have to manually massage or, even better, add some capabilities to your program to decrypt these last examples. We will discuss this more in class.

Design Requirements

You should:

Work in teams of 3: a project leader, a CharMap guy and a FreqList guy as described above.
Use exceptions at least once in both the CharMap and FreqList classes. This is a requirement.
Use your TextMenu classes for your menu(s)

Grading

Short program this time:

Quality, 50%
Function, 50%

Each of these areas is discussed in the handout: CSC 161 Program Design, Style and Testing. Please leave a README file in your folder describing your testing and the state of your program... problems, questions, etc.

thanks... yow, bill

Notes

May 5, 2005

You may feel the need to rewind an input file, or re-read it from the beginning. Check out page 850 of our text and the seekg() member function of fstream. Note: the example on page 850 should read "dataIn.seekg...", not "dataIn.seek". Get it?

Also, did you guys cover sorting already in 160? If not, let me know. These are very small sorts (26 letter), so old, slow, simple bubble sort is fine. It's on page 583.

May 10, 2005

A blurb that I wrote on the research phase of this project: prog04_freq_research.html

yow, bill

May 12, 2005

I have added a function to the k: drive...

common_area/prog04_freqs/comment.cpp

If you install this function in your CharMap and/or FreqList classes, then you will be able to read files that have comment lines beginning with a '#'. I have comments in some of my example FreqList files.