The hashtable in Java

The hashtable is a nice little data structure. It’s useful for storing arbitrary objects in quick time and accessing them just as quickly. The hashtable relies on a hash function. The basic concept of a hash function is that, given some item to store, you can calculate an integer value. So, for example, we would map the value “HashMe” to 53. It’s not necessary to be able to convert back again (from 53 to “HashMe”, for example). The only real requirement is that the hash function must be able to make an integer out of any item you wish to store. This hash function is ultimately used to arrive at an array index. The hash table would have an array backing it, which would contain the data. This array would have a fixed size (actually, there would be some value in being able to “rebalance” an array once it got too full), and the hash value that is calculated for an item would be fit into this array’s bounds courtesy of the modulo function. So, we have the ability to place an item in an array position.

There is one thing to think about though. It is possible (actually probable, if your backing array is small) that two items being added to the hash table would have the same array position calculated for them. In this case, you have a choice. Either you can find the next open array position, or you can use another data structure to store all elements at the same array position. What’s that I hear you say? Could we use a linked list? Yes! I answer you with all my ferve and vigour (ferve may not be a word, but fervour didn’t sound right). Try the Linked List implementation at The Java Linked List. The former approach is called linear probing and the latter is called chaining. Something interesting to note is that if you use a linear probing approach, you would eventually run out of open spots in your backing array and would need to resort to chaining anyway.

The hash function can be anything (you could just set it to 1 for all items, but then it would just be a linked list, and has no value).

Note that items are stored in the table as key-value pairs. So when you add an item, it uses the key to get the hash and hence position.

The Java implementation I’ve done here has a few core operations (we use an array as backing, and we use a linked list for chaining to overcome collisions):

  • add: adding an element to a hashtable is of O(1) complexity, because it’s a case of calculating the hash and then putting the element in the array (or adding it to the linked list, which is also O(1)
  • remove: removing an element is also of O(1) complexity because the location of the element is done by hashing, and then just traversing the list at that position in the table.
  • hash: this calculates the array position for an element. It basically uses the element’s toString() method (all Java Objects have one) and adds up the ASCII values of the characters in the string. For characters between a and f, we use the hex value instead.
  • contains: this operation gets the hash of the element and then looks in the appropriate array element. If there is one, the linked list in that array position is traversed.
  • getNumElements: this operation shows how many elements are in the array
  • toString: returns a string representation of the hashtable

So, the hashtable implementation I have produced here is in Java. I’ll explain the parts of importance. The full source is available in my GitHub repository.

Part 1: The hash table node class
As I mentioned, the hash table is a set of nodes referenced by key. The hash table node, then, contains only 2 things: an Object containing the key and an Object containing the data to store. I chose to make the HashtableNode class a private inner class because I don’t need classes outside of the Hashtable class accessing the HashtableNode class. The HashtableNode class is basically a Java Bean which contains a constructor, an Object representing the key referencing the data and an Object containing the data stored in the node, and getters and setters for these member variables. In addition, because we’ll be storing the nodes in a linked list because of the chaining to manage collisions, there is an equals() method. A utility toString() method is also provided.

Something to note is that the equals() method only checks the equality of the key to determine whether two instances are the same. The reason for this is that in a Hashtable, there is no duplication of keys — if a key/value pair already exists in the Hashtable, it is not added again.

public class Hashtable {
  private class HashtableNode {
    private Object key;
    private Object data;

    public HashtableNode() {
      this.key = null;
      this.data = null;
    }

    public HashtableNode(Object inKey, Object inData) {
      this.key = inKey;
      this.data = inData;
    }

    public Object getData() {
      return data;
    }

    public void setData(Object data) {
      this.data = data;
    }

    public Object getKey() {
      return key;
    }
    public void setKey(Object key) {
      this.key = key;
    }

    /* Equality can be based on key alone because there can't be
     * 2 nodes with the same key in the table */
    public boolean equals(Object obj) {
      if (obj instanceof HashtableNode) {
        HashtableNode node = (HashtableNode)obj;
        return this.key.equals(node.getKey());
      }
      else {
        return false;
      }
    }

    public String toString() {
      return "Key: ["+this.key+"] data: ["+this.data+"]";
    }
  }
}

Part 2: The backing table
The backing table I’ve used is an array of fixed size. The tableSize member variable in this implementation is fixed, although you could have a dynamically calculated backing array size, which grows when needed to avoid excessive chaining. There is a difference between the number of elements in the hash table and the size of the hash table – there could be 20 positions in the backing array, but only 1 of them has any data in it. The backing array in this implementation is an array of Java Objects (actually, it will end up being an array of LinkedLists).

private final int tableSize = 20;
  private int numElements;
  private Object [] table;

Part 3: The hash method

In my implementation, I’ve chosen to have use Java’s toString() method that exists for any Object. If a class does not define a toString() method, it will use the one defined in java.lang.Object. This method returns things like “com.danielacton.datastructures.LinkedList@12345″, which is fine, because it’s a string. So we’re guaranteed to have a string from our element to store. The hash method adds up the ASCII values of the characters in the string. For characters between a and f, we use the hex value instead. Once we’ve got a number, we find the modulus of the number with the table size so that we can fit into the backing array.

private int hash(Object key) {

    /* Start with a base, just so that it's not 0 for empty strings */
    int result = 42;

    String inputString = key.toString().toLowerCase();

    char [] characters = inputString.toCharArray();
    for (int i = 0; i < characters.length; i++) {
      char currentChar = characters[i];

      if (currentChar == 'a' || currentChar == 'b' || currentChar == 'c' ||
        currentChar == 'e' || currentChar == 'e' || currentChar == 'f') {
          result += Integer.parseInt(""+currentChar, 16);
      }

      int j = (int)currentChar;
      result += j;
    }

    return (result % this.tableSize);
  }

Part 4: The add method(s)
To add an element to the hash table, we find the position in the backing array using the hash() method on the key. Then, if there is a linked list already in that position, we append to it by creating a new HashtableNode object and setting the key and data provided and adding that to the list. If there is no linked list in that position, we create one and then add the HashtableNode to it. There are 2 add methods: one that adds a single element and one that adds multiple elements by calling the single add method many times.

public void add(Object key, Object data) {
    if (data == null || key == null) {
      System.err.println("ERROR: Either the key or the data are null");
      return;
    }

    /* Don't add duplicate keys */
    if (this.contains(key)) {
      return;
    }

    /* Find out where in our array should the item go */
    int position = this.hash(key);

    /* If nothing exists in the position, create a new linked list there */
    if (this.table[position] == null) {
      this.table[position] = new LinkedList();
    }

    /* Add to the linked list in the appropriate position*/
    ((LinkedList)this.table[position]).add(new HashtableNode(key, data));
    this.numElements++;
  }

  public void add(Object [] keys, Object [] inputData) {
    for (int i = 0; i < inputData.length; i++) {
      this.add(keys[i], inputData[i]);
    }
  }

Part 5: The remove method(s)
To remove an element from the hash table, we find the position in the backing array using the hash() method on the key provided. Then, if there is a linked list already in that position, we remove the element from it. If there is no linked list in that position, the element does not exist in the hash table so we do nothing. There are 2 remove methods: one that removes a single element and one that removes multiple elements by calling the single remove method many times.

public void remove(Object key) {
    int hashVal = this.hash(key);

    if (this.table[hashVal] != null) {
      HashtableNode node = new HashtableNode();
      node.setKey(key);

      if (((LinkedList)this.table[hashVal]).indexOf(node) > -1) {
        ((LinkedList)this.table[hashVal]).remove(node);
        this.numElements--;
      }
    }
  }

  public void remove(Object [] keys) {
    for (int i = 0; i < keys.length; i++) {
      this.remove(keys[i]);
    }
  }

Part 6: The helper methods
There are a few helper methods I’ve included. contains tells if an element exists in the hash table. toString gives a string representation of the hash table. numElements shows how many elements are in the hash table.

public String toString() {
    StringBuffer buffer = new StringBuffer();

    buffer.append(System.getProperty("line.separator"));
    buffer.append("{");
    buffer.append(System.getProperty("line.separator"));

    for (int i = 0; i < this.table.length; i++) {
      if (this.table[i] != null) {
        buffer.append("\t"+(LinkedList)this.table[i]);
        buffer.append(System.getProperty("line.separator"));
      }
    }

    buffer.append("}");

    return buffer.toString();
  }

  public int getNumElements() {
    return this.numElements;
  }

  public boolean contains(Object key) {
    boolean result = false;
    int hash = this.hash(key);

    if (this.table[hash] != null) {
      HashtableNode node = new HashtableNode();
      node.setKey(key);
      if (((LinkedList)this.table[hash]).indexOf(node) > -1) {
        result = true;
      }
    }

    return result;
  }