Static hashing

Introduction

Static hashing, a technique used in database systems, assigns a search-key value to a fixed memory location using a hash algorithm that consistently produces the same address for a given search-key value. For instance, employing a mod-4 hash function results in only five possible values. Irrespective of the input search-key value, the output address remains constant.

In static hashing, the number of memory buckets remains unchanging. A search-key value will always be linked to the same data bucket address. For instance, if a hash function like mod (5) is used for an employee ID (e.g. 101), the assigned bucket address for this employee remains 1, unchanged even if the input search-key value varies.

As result, the total number of data buckets in memory stays constant during static hashing. For instance, with mod (5), there would be five data buckets allocated in memory. This method efficiently manages large datasets by offering a consistent and predictable data storage location in memory.


Operations of Static Hashing

  • Searching a record
In static hashing, searching for a record involves utilizing the hash function to find the bucket's address where the record is stored. Upon obtaining this address, data retrieval from memory becomes feasible.

  • Insert a Record

When inserting a new record, the hash function generates an address for it, and the record is stored at that specific location within the hash structure.

  • Delete a Record

Deleting a record necessitates fetching the record first, then erasing the data stored at that address from memory. 
  • Update a Record
Updating records follows a similar process, where the hash function locates the record, allowing for data modification.

However, a critical situation arises when the hash function generates an address that's already occupied by existing data, leading to bucket overflow, a collision scenario.

Static hashing efficiently manages data that remains relatively static. Nonetheless, its efficacy diminishes as data volume grows or with frequent insertions. In such cases, dynamic hashing proves to be a more suitable technique.

 1. Open Hashing

Open hashing, termed separate chaining, resolves collisions by employing an array of linked lists within the hash table to store its elements. Upon insertion, an element is hashed to find its designated slot in the array. If the slot is vacant, the element is placed there. However, if occupied, the element joins the linked list at that slot's head.

This technique is simple to implement and highly effective for both searching and inserting elements into the hash table. Nevertheless, when removing elements, it might become less efficient as it may necessitate traversing the entire linked list at the slot's head.

Advantages of open hashing include its simplicity, efficiency in search and insertion, and compatibility with various hash functions. On the flip side, it might exhibit inefficiency in deleting elements and can waste space when the hash table isn't fully occupied.

2. Close Hashing or Open Addressing

Close hashing, also recognized as open addressing, serves as a collision resolution method within hash tables. It involves housing all hash table elements within a single array. When inserting a new element, it's initially hashed to locate its position in the array. If the slot is vacant, the element is placed there. However, if the slot is occupied, the element seeks an alternative slot within the array using a probing sequence.

Various probing sequences exist for this purpose, including:
  • Linear probing: Selects the subsequent array slot by adding 1 to the original slot.
  • Quadratic probing: Chooses the next slot based on a quadratic function of the original slot.
  • Double hashing: Utilizes a second hash function to determine the subsequent slot in the array.

Linear probing:

The probing sequence employed depends on the hash function's characteristics and the desired performance of the hash table.
Linear probing is a collision resolution technique used in open addressing hash tables. In linear probing, when a collision occurs, the next available slot in the hash table is probed until an empty slot is found. The next available slot is found by incrementing the hash index by 1, 2, 3, and so on. 

Here’s an example:
Formula: h(x) = x mod size

Suppose we have a hash table of size 10 and we want to insert the following elements:
  • 25

  • 35

  • 45

  • 55

We use the following hash function:

h(x) = x % 10

The hash table looks like this initially:


We insert 25 first:  h(x) = x % 10

        h(25) = 25 % 10

        h(25) = 5


Next, we insert 35:

        h(35) = 5

Collision! We probe linearly for the next slot.

        h(x) = (x % 10) + index

        h(35) = (35 % 10) + 1

        h(35) = 5 + 1 = 6


Next, we insert 45:

h(45) = 5

Collision! We probe linearly for the next slot.

h(45) + 1 = 6

Collision! We probe linearly for the next slot.

h(45) + 2 = 7

we find an empty slot and insert our element.

Next, we insert 55:

h(55) = 5

Collision! We probe linearly for the next slot.

h(55) + 1 = 6

Collision! We probe linearly for the next slot.

h(55) + 2 = 7

Collision! We probe linearly for the next slot.

h(55) + 3 = 8

Finally, we find an empty slot and insert our element.

Here are some of the advantages of linear probing:

    • It is simple to implement.
    • It has good performance for most workloads.
    • It is efficient in terms of space.

Here are some of the disadvantages of linear probing:

    • It can have poor performance if there are a lot of collisions.
    • It can lead to clustering of keys.
    • It can be sensitive to the hash function used.

Quadratic probing

Quadratic probing is a technique used in open addressing schemes for resolving collisions that occur when hashing a key to a particular index in a hash table. In this technique, when a collision occurs, the algorithm will probe through the table using a quadratic sequence of indices until an empty slot is found. The quadratic sequence used for probing is defined by the formula:


h(x, i) = (h(x) + i2) mod m

h(x, i) =(h(x) + f(i)) mod m


where as: h(x) is the initial hash value of the key x 

                i is the number of probes made so far, and

                m is the size of the hash table.


Initially, with an empty hash table of size 7, the keys 19, 27, 36, 10, and 64 are to be inserted using the hash function h(k) = k mod 7 to determine the slots.

  • Inserting the key 19: h(19) = 19 mod 7 = 5. Slot 5 is empty, so key 19 is inserted into slot 5.

Inserting the key 27: h(27) = 27 mod 7 = 6. Slot 6 is empty, so key 27 is inserted into slot 6.
  • Inserting the key 36: h(36) = 36 mod 7 = 1. Slot 1 is empty, so key 36 is inserted into slot 1.

  • Inserting the key 10: h(10) = 10 mod 7 = 3. Slot 3 is empty, so key 10 is inserted into slot 3

  • The fifth key to be inserted is 64. By applying the hash function h(64) = 64 mod 7 = 1, it should ideally go into slot 1. However, slot 1 is already occupied by the key 36. Therefore, quadratic probing is applied to find the next available slot.

    During the first probe attempt (i=1), the function f(i) = i2 = 12 = 1. Thus, the system attempts to insert the key into slot (1 + f(i)) mod 7 = (1 + 1) mod 7 = 2. Since slot 2 is unoccupied, the key 64 is inserted into slot 2.

Double hashing

Double hashing is another collision resolution technique used in open addressing for hash tables. It uses two hash functions to calculate the position of a key in the table. The first hash function is used to determine the initial slot for the key, and the second hash function is used to determine the step size for probing.

The collision rate in double hashing is typically lower than in other methods such as linear probing or quadratic probing due to the use of two hash functions. This technique can minimize the possibility of collisions occurring.

For example, let’s say we have a hash table of size m and we want to insert a key x. We’ll use two hash functions: h1(x) = x mod m and h2(x) = 1 + (x mod (m-1))
The first hash function h1(x) is used to determine the initial slot for the key. If that slot is already occupied, we’ll use double hashing to find the next available slot.

On the first attempt (i=1), we’ll try to insert the key into slot (h1(x) + i * h2(x)) mod m. If that slot is also occupied, we’ll try again with i=2. On the second attempt (i=2), we’ll try to insert the key into slot (h1(x) + i * h2(x)) mod m. We’ll keep trying with increasing values of i until we find an available slot.

Here is an example to illustrate how double hashing works:

Let's say we have a hash table of size 7 and we want to insert the keys 19, 27, 36, 10, and 64 using double hashing in open addressing. We’ll use two hash functions: h1(k) = k mod 7 and h2(k) = 1 + (k mod 6).

  • The first key is 19. h1(19) = 19 mod 7 = 5, so we’ll try to insert it into slot 5. Since it’s empty, we’ll insert it there.

  • The second key is 27. h1(27) = 27 mod 7 = 6, so we’ll try to insert it into slot 6. Since it’s empty, we’ll insert it there.

  • The third key is 36. h1(36) = 36 mod 7 = 1, so we’ll try to insert it into slot 1. Since it’s empty, we’ll insert it there.

  • The fourth key is 10. h1(10) = 10 mod 7 = 3, so we’ll try to insert it into slot 3. Since it’s empty, we’ll insert it there.

  • The fifth key is 64. h1(64) = 64 mod 7 = 1, so we’ll try to insert it into slot 1. However, slot 1 is already occupied by the key 36. So, we'll use double hashing to find the next available slot.

    • On the first attempt (i=1), h2(64) = 1 + (64 mod 6) =5. So, we'll try to insert the key into slot (h1(64) + i * h2(64)) mod m = (1 + i *5) mod m = (1+1*5) mod 7=6. Since slot 6 is already occupied by key 27, we will try again with i=2

    • On the second attempt (i=2), h2(64) =4. So, we'll try to insert the key into slot (h1(64) + i * h2(64)) mod m = (1+2*5) mod7= 11%7=4. Since slot 4 is empty, we will insert key there.


Post a Comment

0 Comments