A Hash Table is a data structure designed to be fast to work with. The reason Hash Tables are sometimes preferred instead of arrays or linked lists is because searching for, adding, and deleting data can be done really quickly, even for large amounts of data. For example, with a Hash Table, finding “Bob” inside is done really fast because there is a way to go directly to where “Bob” is stored, using something called a hash function.
To get the idea of what a Hash Table is, let’s try to build one from scratch, to store unique first names inside it. We will build the Hash Set in 5 steps:
To make interacting with the list of names really fast, let’s use a Hash Table, or a Hash Set, which is a simplified version of a Hash Table. To keep it simple, let’s assume there is at most 10 names in the list, so the array must be a fixed size of 10 elements. When talking about Hash Tables, each of these elements is called a bucket.
1 | my_hash_set = [None,None,None,None,None,None,None,None,None,None] |
We want to store a name directly into its right place in the array, and this is where the hash function comes in. A hash function can be made in many ways, it is up to the creator of the Hash Table. A common way is to find a way to convert the value into a number that equals one of the Hash Set’s index numbers, in this case a number from 0 to 9. In our example we will use the Unicode number of each character, summarize them and do a modulo 10 operation to get index numbers 0-9.
The character “B” has Unicode code point 66, “o” has 111, and “b” has 98. Adding those together we get 275. Modulo 10 of 275 is 5, so “Bob” should be stored as an array element at index 5. The number returned by the hash function is called the hash code.
1 | def hash_function(value): |
After storing “Bob” where the hash code tells us (index 5), our array now looks like this:
1 | my_hash_set = [None,None,None,None,None,'Bob',None,None,None,None] |
We have now established a super basic Hash Set. To find out if “Pete” is stored in the array, we give the name “Pete” to our hash function, we get back hash code 9, we go directly to the element at index 9, and there he is. We found “Pete” without checking any other elements.
When deleting a name from our Hash Set, we can also use the hash function to go straight to where the name is, and set that element value to None.
1 | my_hash_set = [None,'Jones',None,'Lisa',None,'Bob',None,'Siri','Pete',None] |
Let’s also add “Stuart” to our Hash Set. We give “Stuart” to our hash function, and we get the hash code 3, meaning “Stuart” should be stored at index 3. Trying to store “Stuart” creates what is called a collision, because “Lisa” is already stored at index 3.
To fix the collision, we can make room for more elements in the same bucket, and solving the collision problem in this way is called chaining. We can give room for more elements in the same bucket by implementing each bucket as a linked list, or as an array.
After implementing each bucket as an array, to give room for potentially more than one name in each bucket, “Stuart” can also be stored at index 3, and our Hash Set now looks like this:
1 | my_hash_set = [ |
To complete our very basic Hash Set code, let’s have functions for adding and searching for names in the Hash Set, which is now a two dimensional array.
1 | my_hash_set = [ |
Hash Tables are great for:
The most important reason why Hash Tables are great for these things is that Hash Tables are very fast compared Arrays and Linked Lists, especially for large sets. Arrays and Linked Lists have time complexity O(n) for search and delete, while Hash Tables have just O(1) on average!
A Hash Table can be a Hash Set or a Hash Map. Hash Table elements are stored in storage containers called buckets. Every Hash Table element has a part that is unique that is called the key. A hash function takes the key of an element to generate a hash code.
Algorithm, Hash Table, Python — Sep 12, 2022
Made with ❤️ and ☀️ on Earth.