What is a hash table?
A hash table also called a hash map, is a data structure that maps key-value pairs. The hash table can be used in complex systems to provide fast access and compact storage. A hash table is a structure that stores data in an abstract data type kind of associative array. It keeps data in an array format, with a distinct index value for every data value. If the index of the required data is known, then the data can be accessed very easily.
What are the key components of a hash table?
There are two key components of the hash table namely hash function and a bucket array. The bucket array is used to store key-value pairs based on the computed index values.
Any procedure that can translate large or small amounts of data to fixed-size values is a hash function. Hash values, hash codes, digests, or simply hashes result from a hash function.
Hash Function
A hash function is used to convert a big number, say a huge bank account number, into a small integer value. This mapped integer value is used as an index in the hash table.
Some key features of hash-function are listed below.
- The key-value pair is determined using a hash function.
- While generating a hash table, it is advisable to choose a good hash function.
- This hash function should also be one-way, meaning it should allow the extraction of the hash value from the key but not the other way around.
- It must also avoid producing the same hash value for many keys.
Example of hash function is given below:-
h(key)=key mod size_of_hashtable, that is, key % size_of_hashtable
Understanding the hash function
A hash function is a procedure that can convert a data set of any scale into a hash table data set of a specific size.
A hash function should meet the following conditions to act as a satisfactory hashing method.
- Simple Calculation: It must be simple to compute the hash code.
- Stable Dispersion: It should guarantee that the hash table has a uniform distribution. It should also prevent grouping.
- Fewer Collisions: A collision occurs when two elements are mapped to the same hash value. This can be handled by chaining, that is providing space to the collided element, by linking from already occupied element.
Requirements of a good hash function
The need to use a good hash function is very important when constructing a hash table. As hash function uniformly divides the data across the whole set of hash values. A hash function must provide different hash values for every input data. A good hash function provides faster computation and minimizes the duplication of data.
Implementing a hash table
The core concept of hashing is to divide key/value pairs in a hash table among an array of stand-ins or "buckets."
The hash table is a data structure used to hold or store key/value pairs. It calculates the index value where the element will be inserted.
Let us understand the concept with the help of an example.
A hash function produces an index from which the object can be retrieved or to which the object will be saved.
- index = f(key, array_size)
This can be typically accomplished in two steps:
- hash = Hash_function(value)
- index = hash % array_size
Consider the string Str = “aabbc”.
The frequency of each character that appears in Str is calculated first. The most straightforward approach to achieve this is to go through all the potential letters one by one and count their frequency.
This method works, but it is sluggish. The time complexity of such a method is O(26*N), where N is the length of the string str and each character can be from the set of alphabets A to Z.
The method that determines the frequency of the letters a to z in a string is mentioned below.
void count_frequency(string Str) { for(char k = ‘a’;k <= ‘z’;++k) { int f = 0; for(int j = 0;j < Str.length();++j) if(S[j] == k) f++; cout << k << ‘ ‘ << f << endl; } }
Output
a 2b 2c 1d 0e 0f 0…z 0
Let's have a look at a hashing-based method.
Take an array and use the hash function to hash the 26 possible characters using the array's indices.
This hashing method has an O(N) complexity, where N is the length of the string.
int Fr[26]; int Hash_Func(char k) { return (k - ‘a’); } void get_index(string Str) { for(int j = 0;j < Str.length();++j) { int index = Hash_Func(Str[i]); Fr[index]++; } for(int j = 0;j < 26;++j) cout << (char)(j+’a’) << ‘ ‘ << j << ‘ ‘ << Fr[j] << endl; }
Output
a 0 2b 1 2c 2 1d 3 0e 4 0f 5 0…z 25 0
Hash Collisions
Hash collision arises when the hash algorithm or hash function gives the same hash value for two different input values.
To resolve the problem of collision, any of the following methods can be used:-
- Chaining or closed addressing
- Open-addressing
- Linear-probing
Chaining
Chaining is used to solve the problem of collisions. Each slot in the array contains a link to a singly-linked list. Each singly linked list includes the key/value pair with the same hash value. If one wants to add the new key/pair value, it should be added to the end of the singly linked list. To find the matching key, one can use the lookup algorithm. Initially, all the table slots have null values. The list is created only when a certain hash value is added for the very first time. In this way, the chaining process works and avoids collisions.
Consider the case where the value of element 152 is "Sia Jones." If the value "Cass Atkin" is entered in the same key, it is inserted to element 152 as a new element, just after "Sia Jones."
152: [["Sia Jones", "p01"]]
...
152: [["Sia Jones", "p01"] ["Cass Atkins", "p02"]]
The biggest disadvantage of chaining is that it adds to the time complexity. Each search will take longer than 0(1) the linked list associated with the index must be scanned to get the appropriate value.
Open Addressing
Open addressing means moving across the hash table's keys until you locate a vacant one. If "Sia Jones" was mapped to 152, "Cass Atkins" would be mapped to the following open index:
152: ["Sia Jones," "p01"]
...
152: ["Sia Jones", "p01"],
153: ["Cass Atkins", "p02"]
The most significant disadvantage of open addressing is that it may not appear in the keymap you anticipate when looking up values. It would be best to search across multiple areas of the hash table to find the desired value.
Linear Probing
The hashing technique can construct an array index that someone already utilized and randomized. In this situation, we can look into the cells until we find an empty one to identify the next vacant spot in the array. This technique is called linear probing.
Common Mistakes
- There can't be more than one item of the same key in a hash table. Re-inserting the same key overwrites the initial entry.
- After creating a hash table for every phrase, instead of converting them into targeted tuples, try straightening these hash tables.
Contexts and Applications
This topic is significant in the professional exams for graduate and postgraduate courses,
especially for
- Bachelor of Science in Computing Systems
- Bachelor of Science in Information Systems
- Masters of Science in Computer Science
Related Topics
- Types of hashing in data structure
- Hash table implimentation in C++
- Linear Probing Hash table
- Message digests
- Cryptography
Practice Questions
Q.1 For the data (4322, 1334, 1471, 9679, 1989, 6171, 6173, 4199) and the hash function x mod 10, which statement is true?
(A) 9679, 1989, 4199 hash to their corresponding values.
(B) 1471, 6171 hash to their corresponding values.
(C) All elements hash to their corresponding values.
(D) Each element hashes to a distinct value.
Correct Option: (C)
Explanation:-When one uses hash function, It maps to their corresponding values.
Q.2 Which is the most effective way for avoiding collisions?
(A) Open addressing
(B) Chaining
(C) Linear probing
(D) None of the above
Correct Option: (B)
Explanation:- Chaining is used to solve the problem of collisions. Each slot in the array contains a link to a singly-linked list. Each singly linked list includes the key/value pair with the same hash value
Q.3 What is a hash table?
(A) A table that maps values to keys
(B) A table that maps keys to values
(C) A structure used for storage
(D) A structure used to implement queues
Correct Option: (B)
Explanation:- A hash table also called a hash map, is a data structure that converts keys into values.
Q.4 What is it called when various elements compete for one bucket?
(A) Collision
(B) Diffusion
(C) Replication
(D) Duplication
Correct Option: (A)
Explanation:-Collision arises when one or more elements compete for one bucket or data area which causes collision.
Q.5 What is the other name of chaining?
(A) Open addressing
(B) Linear-probing
(C) Closed addressing
(D) Hashing
Correct Option: (C)
Explanation:-chaining is also known as closed-addressing.
Want more help with your computer science homework?
*Response times may vary by subject and question complexity. Median response time is 34 minutes for paid subscribers and may be longer for promotional offers.
Search. Solve. Succeed!
Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.
Hash Table Homework Questions from Fellow Students
Browse our recently answered Hash Table homework questions.
Search. Solve. Succeed!
Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.