CSC 220 Program #3

"MC Hamming code"

Program #3 is:

  • due Friday  November 18, 2005

  • worth 10 points, or 10% of your grade

Description

Your task is to write two programs that use the Hamming code:

  • encode - reads a byte stream and encodes the data with error-correcting information
  • decode - reads an encoded byte stream and decodes it, correcting any errors found in the stream

The Hamming code is discussed in our text on pages 73-77. Also, enjoy:

In short, the Hamming code is one flavor or an error-correcting code. Data is encoded with error-correcting information and then decoded. During the decoding process, any single-bit errors in the original data can be detected and corrected using the algorithm.

The steps for encoding are:

  1. Read a data word
  2. Create a number of parity check bits whose value is determined using the data word
  3. Write the new code word which consists of the data word and the check bits
  4. Go to step 1 until the byte stream is exhausted

The steps for decoding are:

  1. Read a code word
  2. Check the code word for errors using the check bits
  3. If any errors were found, then correct them
  4. Write the data word
  5. Go to step 1 until the byte stream is exhausted

Some details

For Program #3, let's use:

  • a data word size of 2 bytes (16 bits)
  • 5 check bits (this is the number required to encode 16 bits of data), making the total code word size 21 bits
  • As a shortcut, let's pad the 5 check bits so that it's an even 8 bits. This means that the code word will be 3 bytes long (2 bytes data and 1 byte check bits)

If you're ambitious you can increase your data word size to 16 bytes (128 bits). This is the maximum data word size that can be handled with 8 check bits using the Hamming algorithm. It's some extra work though. Note that the overhead in this case (128 bits of data) is only 6% (8/128).

     

Implementation

In our first programs, we almost exclusively dealt with 4 byte integers. Now, you have to be very aware of byte size. Including:

  • Instructions that deal with 1 byte must end in the letter 'b'. For example: 

          movb $0,%al

  • Instructions that deal with 2 bytes at a time end with the letter 'w', for "word": 

          movw $1,%ax

  • Register names change when dealing with different byte sizes:
    • %eax - a 4 byte register
    • %ax - the lower 2 bytes of eax
    • %al - the low byte of ax
    • %ah - the high byte of ax
    • This name convention applies for all registers: ebx-edx, edi, esi
  • When dealing with 1 or 2 byte quantities on the stack, push and pop/add appropriately.

Reading & writing bytes of data

You can use two C functions to read and write data 1 byte at a time. They are:

  • int getchar()

    Reads one byte and returns it. It returns -1 at the end of the stream.

     

  • int putchar( int)

    Writes one byte. The return value is the character written or an EOF marker. You can probably ignore this for Program #2.

Notice that getchar() and putchar() deal with 4 byte quantities. The 1 byte read or written will be found in the lower byte of the 4 bytes. For more information and examples, try "help" on Microsoft Visual-whatever or google "getchar".

Organizing the code word

In my implementation, I found it easier to separate the check bits and the data word when creating my code word. Instead of the organization shown in the example above, I used the following setup:

Data byte 1 Data byte 2 Parity byte

The "Parity byte" is actually the 5 check bits padded with three zeros (000) on the front. Please note this choice is not required. I just found it easier to design the program this way.

Lecture topics

I will lecture on the following topics:

  • Bit flags
    • Defining hex constants
    • Binary masks
    • With andb
    • With orb
    • With XOR
  • More EFLAGS - parity and carryout
  • Parity checks
  • Shifting bits
    • shlb
    • shrb

Is this section a reminder for you or me?

 

Logistics

Grading and logistics are the same as in Programs #1 and #2.

You can implement your program in one large file or separate files. I recommend separate files. You simply need to supply gcc with each the files you have created to do the job:

gcc file1.s file2.s file3.s

Speaking of large, this is a larger and more complex program than the previous two. You can, if you like, start writing your program in C. You can use gcc to create assembly code for the C you have written:

gcc -S file1.c

The assembly code output by the compiler (file1.s in the example above) will need to be tweaked and commented, but it is a nice starting point. Please note that uncommented and/or unmodified compiler-generated assembly code will be graded harshly... but of course.

  

Hints and more

I will provide a couple test cases of my own later this week.

A fairly detailed example of Hamming code at work: Program #3 example