how hashmap works java

HashMap in Java

What is Hashing?

Hashing is a mechanism to convert one value to another. The process which performs this transformation is called the ‘Hash Function’. The input to the hash function is normally called as the ‘Key’ and this could be any value which can be converted to a String. The output or the hashed value from the hash function is a numerical value which is often called as a ‘Hash Code’ and output is in Integer form. Other terms for hash code are the hash sum, message digest or hash. This hash code is mostly used in security, sorting and internal object representation tasks.

 

Hashing Mechanism

Hash Function

In HashMap, hash function yields the reference to the memory location of an object in integer format. The hash function should produce same result or the hash code every time when applied to the same object or equal objects. Most commonly in programming, the hash function is denoted as ‘hashcode()’ which can be implemented as arbitrary to generate unique identifiers. Hence, hashcode() method can be implemented natively.

What is a HashMap?

HashMap is one of a data structure to store and manage data. HashMap uses hashing mechanism internally to generate a unique identifier for an object. HashMap contains a key-value pair to represent the internal data organization. The key is the identifier for the data to be stored while the value is the actual data which is stored. This key-value pair is called a map entry because it implements the Map.Entry interface which is used to achieve the mapping of keys to values. HashMap is a generic class from java.util package and the common notations of HashMap are,

  • Map mapA = new HashMap();
  • Map<String, MyObject> map = new HashMap<String, MyObject>();

Basics of HashMap Building

According to figure 1, there are three main foundation elements of the hashing mechanism – key for the hash function, hash function and the hash code which is generated by the hash function.

Above basic elements provide the required building blocks to the HashMap structure. HashMap can be identified as an array of connected elements. HashMap consists of mainly two sections. Buckets and ‘Entry’ objects with key-value pairs. HashMap consists of a number of buckets to store entry objects. Together, they built the HashMap object and there are several operations which we can perform with HashMap to achieve our data structure needs.

What is a Bucket in HashMap?

A bucket is a special location in memory to store the entries of a HashMap. There is a unique identifier for the bucket as well, which is derived using the hashcode() method and a native hash() method inside the HashMap class. One bucket is used to store one or more entries. These entries contain mainly the key-value pair for the insertions and two other attributes for internal usage. Each entry is named as an ‘Entry’ which is an instance of the ‘Entry’ inner-class.

What is the Entry Class

Entry class implements the Map.Entry interface which supports the mapping mechanism. Let’s learn a bit more about this Entry class to understand the internals of HashMap structure. Below are the main parts of the Entry class,

 

This class contains a key to be inserted and the respective value. Each entry object contains this particular ‘key’ which is made final since HashMap does not permit duplicate keys. So, it is immutable. Then, there is another final field as ‘hash’ which is the hash code of the ‘K’ (key) that is used to identify the location of the bucket. This should also be final since there should be only one identifier for a bucket location. Bucket stores the entries on a linked list there should be a link to the next item, which is denoted by another attribute of type ‘Entry’ as ‘next’.

 

Hashmap Bucket

HashMap keeps an array of buckets and each bucket is an item of a linked list where the key-value pair is encapsulated to the Entry object instance. Since it is an array representation it is essential to be aware of the array expanding limits and thresholds to take certain actions internally. Also, these parameters will affect the performance of the HashMap.

Initial Capacity

Capacity is measured by the number of buckets in a HashMap. One bucket can have more than one entry. The default initial capacity of a bucket is 16. When HashMap reaches the indicated threshold, capacity will be doubled. Anyway, resizing ability will be decided by the load factor based on the following equation.

Threshold = Capacity (Number of Buckets) * Load Factor

Load Factor

This is the threshold evaluator of the HashMap. The default load factor is 0.75 for any HashMap and upon the increasing number of entries, capacity is calculated with bucket count and load factor. If it’s reaching the limits internal array is doubled in the size and new elements are moved to the new bucket location.

Why we need hashcode() and equals() Methods?

These two methods ensure an accurate internal execution of the HashMap objects. A combination of these two methods are used to store and look-up the objects in a HashMap.

hashcode() – This method is used to calculate the hash code of the key for the object being stored. The calculated hash code is then used to identify the bucket location in the memory. Also, this hash code is used to determine the location of an object in the search criteria. For customized hashMaps, hashcode() method should be overridden to get better results.

equals() – This method is used to compare the equality of two objects. Most of the time hash code of equal objects yield ‘true’ from the equal() method. This method also can be overridden according to system needs.

More information – https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode–

What are the Internal Building Blocks of the HashMap?

HashMap internally uses two primary data structures to store items.

  1. Array – To maintain buckets in a table form
  2. Linked List – To maintain entry objects

 

hashmap internal datastructure

As picture illustrated buckets are maintained as arrays and identified by indexes for each location. The initial size of this array is 16 and if kept adding objects, at some point hash function may return the same hash code for two keys, at this time a collision occurs in the HashMap. To overcome this problem linked list is created at the collided location and the new entry is stored as the next node to the existing entry node. Hence, there can be two different key-value pairs due to a similar hash code by the hashcode() method.When the HashMap exceeds the threshold, the map will re-size itself by creating a new bucket. The size of this new array will be twice of the previous HashMap.

Internal Storage Structure of HashMap

The diagram is a conceptual representation of a HashMap. As noted earlier HashMap initial capacity is 16 buckets. Each bucket contains a linked list of Entry object with the key-value pair, node link to the next entry and hash value of the bucket.

 

hashmap internal data

Learn by Example

Let’s try to learn the HashMap working by an example. Assume there are students in the class and teacher have to store subject stream for each student. Each student has a student id for easy reference and student-subject is stored as a key-value pair in a HashMap.

Then a Test class is written to insert the student-subject (key-value) pairs in the HashMap. Here, 3 student objects are created with the parameterized constructor where student id is used as the ‘key’ for the map entry. The corresponding value for the key (student id) is the marks for the subject.

So, the results will be as below,

Student – Peter has selected the subject: Science

Student – Maya has selected the subject: Art

Student – John has selected the subject: Commerce

Now, the ‘studentMarksMap’ can be visualized as below,

 

hashmap example

Let’s check how HashMap stores the values inside. Bucket-id is generated via the hashcode() function in the ‘Student’ class while the key is the student id and value is the subject stream. An empty HashMap ‘studentSubjectMap’ was created with an initial capacity of 16 and load factor 0.75. HashMap consists of 2 data types as ‘Student’ and ‘String’ to store required data and default HashMap constructor is used in this example is as below,

“ Map< Student,String> studentSubjectMap = new HashMap< Student, String>();”

Method put() in Work

This method is used to store the key-value pairs in the HashMap. Method put takes key and value as the two input parameters. After creating the initial empty HashMap, items are stored in the ‘studentSubjectMap’ by executing the put() method. Method will return any previous value associated with the given key or will return null if there was no mapping value. Let’s look at how the put() method was written to execute the command.

If we try to insert our first key-value pair from the example – that is key = stud1 (object) and value = science (String), The execution flow will be as below,

  1. Perform the null check for the key to since HashMap allows one null key and if key=null it will be assigned the hash = 0 as the hash code of null is 0. Hence the bucket id also becomes 0.
  2. The second step is to perform the hashcode() function on ‘key’ if it is non-null – additional hash function is applied to the generated hash code since most of the custom hashcode() methods are poorly built
  3. The indexFor() function calculates the exact index for or the bucket location to store the entry object. This hash value is made final to avoid repeatable calculations for the same key.
  4. Next step is to check whether there is any entry object already stored in the calculated bucket location, because different objects may yield same hash code at times. And if exists the second object also will be stored in the same bucket as a linked list element. At this moment the next attribute is checked to look any linked elements if next is not null, a check will be performed to find the location where next = null and the new object is inserted at that position as the last object where its next attribute will be null.

If there isn’t already existing element new object is stored as the new entry.

If there is a value with the same key as previously inserted, equal() will come into play and perform the ‘key.equal(k)’ function to check the equality of object keys. If return true, the old value is replaced with the new value. This prevents the duplicate key insertion as well.

  1. Finally, addEntry() method is executed to store the key-value pair in the bucket location. Two additional parameters are also passed to make the entry object for the bucket.

Method get(key) in Work

This HashMap method retrieves the value corresponds to the given key. The key is the input parameter for the get() method. This method also performs the null check for the key. Since there is only one place that is index 0 for the null key, it returns the value in index 0 as the result. Otherwise, the method calculates the hash for the given key using the native and HashMap specific hash functions and find the bucket location to get the corresponding value.

Sometimes, one bucket may have multiple entry objects, it has to perform equal() method for each key to find the exact match we are looking for traversing through the linked list. Finally, the equal() will return true for the exact match and then get() returns the value for the given key.

If get() method is not able to find a value correspond to the inserted key in the bucket, it returns null.

Both the put() and get() methods use the hashcode() and equals() method to generate the hash code and check the equality. Thus, above two methods play very significant role in HashMap execution.

One thought to “HashMap in Java”

Leave a Reply