CSC 210 Program #4

www.hash_table.com

Logistics

Go...

  •  Worth 4 points, or 4% of your class grade
  •  Assigned Saturday May 6, 2006
  •  Due Saturday May 13, 2006 by the end of the day... a 1-weeker!
  •  Covers hash tables

Description

In Program #4, you'll implement your own hash table. Web site URL's and IP addresses will provide a nice testbed for some hash table experimentation.

So, you're surfing the net with your favorite browser. You enter a web site name, called a Universal Resource Locator (URL), and load in the web page. Easy enough.

Well, under the hood, each URL must be converted to an Internet Protocol (IP) address so that the web site can be located on the net. For example: www.wikipedia.com has an IP address of 207.142.131.248.

In Program #4, we will simulate this process and write our own hash table to do it. Here are the steps your program should take:

  1. Load a file of "fake" URLs and IP addresses that I have created for you
  2. Query the user for a URL
  3. Find the corresponding IP address for the URL in your hash table
  4. If the IP address was found, then report it
  5. Else ask the user if he/she would like to add the URL to your table... if so, do it
  6. Go to step 2

Implementation

You'll find a file of web sites in the common area of the k: drive. Here's a copy if you're working at home: websites.txt.

Please implement your own hash table, complete with get(), put() and remove() methods. The book has a nice partial implementation of a hash table. 

Here are some side-comments:

  •  I used chaining to resolve collisions. I used the generic LinkedList<> class from the Collections API as my linked list.
  •  I simplified things (old school) by assuming that the elements of my hash table have their keys included in them. I did this because it's a different than our book, and it's easier and less expensive, memory-wise. So, MyHashTable<E> has only one generic parameter, rather than two.
  •  Please implement your own hash function. I would get your program running using Java's standard hashCode() method and then replace it with your own. Let's use the hash function given at the bottom of page 473 in our text... and use Horner's Rule (see below) to make the calculation more palatable and efficient.
  •  You can see some experimentation in my implementation. The MyHasher interface is an alternate way of communicating a hash function to your table. In fact, my hash table does both. I first look whether a MyHasher has been defined, and if not, then I just use the hashCode() method. This is similar to the Comparator interface shown on page 444 that can be used for sorting.
  •  I rehash my table whenever it hits the load threshold (a fraction between 0 and 1) specified.
  •  It's fun to keep track of statistics like # collisions, # rehashes, etc.
  •  Concentrate on your table and getting it working and such, rather than a fancy UI. I just used that handy JOptionPane.showInputDialog() from Program #2. I wrote statistics and such to the console.
  •  I want to see some preliminary test code in your hash table or other classes. Don't expect to write your hash table and then just load it up with 20,000 items, expecting it to work. Also, exercise removal in your small test main().
  •  I have this goofy warning I keep getting, that I can't seem to silence (for now!). If you get something like this, investigate it but don't let it stop your work.

What is Horner's Rule? It's a little niftier way to calculate your hash function with all those exponents and such. If you want to calculate value of the polynomial:

A3X3 + A2X2 + A1X1 + A0X0

it can be done like this:

(((A3)X + A2)X + A1)X + A0

With a nice prime multiplier like 37, your Java to hash String str using Horner's Rules might look something like this:

int hashValue = 0;
for( int i = 0; i < str.length(); i++) {
   hashValue = (hashValue * 37)
                   + (int) str.charAt( i);
}
return (hashValue % tableSize);

Please peruse my Javadoc. If you have questions about my choices, then please email me. Remember, the book and I are just resources... implement things as you want.

Here's a code fragment that I used to read the websites.txt file:

// The reader parameter is our input file
void loadFile( BufferedReader reader) {
  String url, ip;
  Scanner sc = new Scanner( reader);
  while( sc.hasNext()) {
    url = sc.next();
    ip = sc.next();
    if( url != null && ip != null) {
      // valid url, ip to process
    }
  }
}

Grading

Same deal, guys...

By the due date, please place your work for Program #4 in your folder on the k: drive. I'll be looking for:

  •  A README file describing the state of your program... what works, what doesn't, etc.
  •  All your *.java Java source code files
  •  All your *.class files... please compile your Java
  •  A javadoc folder with the nifty auto-magic documentation of your classes

Of course, your code must be beautiful and follow the class coding guidelines. Code that does not meet this metric will be served a harsh brand of grading justice.

Etc

Hey, www.register.com does something similar to our Program #4... but for real.

A little debug check for you... my program #4 says that websites.txt has 22,145 URL's in there.

BTW, I've been looking for a "real" file of web addresses but had difficulty finding it. the file is actually called the hosts file. It's format is similar to my websites.txt, except it allows comments and the URL and IP fields are switched. Just FYI.

good luck... yow, bill

...

william.krieger.faculty.noctrl.edu

wtkrieger@noctrl.edu