🌑

Stephen's Blog

Building A Hash Set in Python

 

Stephen Cheng

Intro

A Hash Set is a form of Hash Table data structure that usually holds a large number of elements. Using a Hash Set we can search, add, and remove elements really fast. Hash Sets are used for lookup, to check if an element is part of a set. A Hash Set stores unique elements in buckets according to the element’s hash code.

  • Hash code: A number generated from an element’s unique value (key), to determine what bucket that Hash Set element belongs to.
  • Unique elements: A Hash Set cannot have more than one element with the same value.
  • Bucket: A Hash Set consists of many such buckets, or containers, to store elements. If two elements have the same hash code, they belong to the same bucket.

Direct Access in Hash Sets

Searching for Peter in the Hash Set above, means that the hash code 2 is generated (512 % 10), and that directs us right to the bucket Peter is in. If that is the only name in that bucket, we will find Peter right away. In cases like this we say that the Hash Set has constant time O(1) for searching, adding, and removing elements, which is really fast.

But, if we search for Jens, we need to search through the other names in that bucket before we find Jens. In a worst case scenario, all names end up in the same bucket, and the name we are searching for is the last one. In such a worst case scenario the Hash Set has time complexity O(n), which is the same time complexity as arrays and linked lists.

To keep Hash Sets fast, it is therefore important to have a hash function that will distribute the elements evenly between the buckets, and to have around as many buckets as Hash Set elements. Having a lot more buckets than Hash Set elements is a waste of memory, and having a lot less buckets than Hash Set elements is a waste of time.

Hash Set Implementation

Hash Sets in Python are typically done by using Python’s own set data type, but to get a better understanding of how Hash Sets work we will not use that here. To implement a Hash Set in Python we create a class SimpleHashSet.

Inside the SimpleHashSet class we have a method __init__ to initialize the Hash Set, a method hash_function for the hash function, and methods for the basic Hash Set operations: add, contains, and remove. We also create a method print_set to better see how the Hash Set looks like.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
class SimpleHashSet:
def __init__(self, size=100):
self.size = size
self.buckets = [[] for _ in range(size)] # A list of buckets, each is a list (to handle collisions)

def hash_function(self, value):
# Simple hash function: sum of character codes modulo the number of buckets
return sum(ord(char) for char in value) % self.size

def add(self, value):
# Add a value if it's not already present
index = self.hash_function(value)
bucket = self.buckets[index]
if value not in bucket:
bucket.append(value)

def contains(self, value):
# Check if a value exists in the set
index = self.hash_function(value)
bucket = self.buckets[index]
return value in bucket

def remove(self, value):
# Remove a value
index = self.hash_function(value)
bucket = self.buckets[index]
if value in bucket:
bucket.remove(value)

def print_set(self):
# Print all elements in the hash set
print("Hash Set Contents:")
for index, bucket in enumerate(self.buckets):
print(f"Bucket {index}: {bucket}")

# Creating the Hash Set from the simulation
hash_set = SimpleHashSet(size=10)

hash_set.add("Charlotte")
hash_set.add("Thomas")
hash_set.add("Jens")
hash_set.add("Peter")
hash_set.add("Lisa")
hash_set.add("Adele")
hash_set.add("Michaela")
hash_set.add("Bob")

hash_set.print_set()

print("\n'Peter' is in the set:",hash_set.contains('Peter'))
print("Removing 'Peter'")
hash_set.remove('Peter')
print("'Peter' is in the set:",hash_set.contains('Peter'))
print("'Adele' has hash code:",hash_set.hash_function('Adele'))

, , — Nov 20, 2022

Search

    Made with ❤️ and ☀️ on Earth.