Hash Tables and Hash Functions in C++

In computer science, a hash table is a data structure used to implement an associative array or set, which maps keys to buckets or indices using a specific function called a hash function. Hash tables provide fast data access speeds, making them widely used in many applications.

1. What is a Hash Table?

1.1 Basic Concepts of Hash Tables

A hash table consists of the following elements:

Key: A value used to uniquely identify a data item.
Value: The data corresponding to the key.
Hash Function: An algorithm that generates a fixed-length integer (commonly referred to as a “bucket” or “index”) from the input key.

1.2 Hash Collisions

A collision occurs when two different keys are mapped to the same bucket. There are several strategies to handle this situation, such as chaining and open addressing.

1.3 Performance

The average time complexity for search, insert, and delete operations is O(1), but in the worst case, it can degrade to O(n), depending on the collision handling strategy and load factor.

2. Standard Library Support in C++

The C++ Standard Library provides unordered_map and unordered_set to implement basic hash table functionality. Below are some simple examples demonstrating how to use these data structures.

2.1 Importing Header Files

To use unordered_map or unordered_set, you need to include the corresponding header files:

#include <iostream>
#include <unordered_map>
#include <string>

2.2 Example of Using unordered_map

The following code demonstrates how to create a simple student grade management system:

#include <iostream>
#include <unordered_map>
#include <string>

int main() {
    // Create an unordered map to store student names and scores
    std::unordered_map<std::string, int> scores;

    // Insert data
    scores["Alice"] = 90;
    scores["Bob"] = 85;
    scores["Charlie"] = 92;

    // Query Bob's score
    std::cout << "Bob's score: " << scores["Bob"] << std::endl;

    // Update Alice's score
    scores["Alice"] = 95;

    // Delete Charlie's data
    scores.erase("Charlie");

    // Iterate through all students and their scores
    std::cout << "\nAll students' scores:" << std::endl;
    for (auto const& pair : scores) {
        std::cout << pair.first << ": " << pair.second << std::endl;
    }
    return 0;
}

Check Output Results:

This code snippet first adds three students and their scores to the unordered map, then modifies Alice’s score and deletes Charlie’s record. Finally, it iterates through and prints all remaining students and their scores.

2.3 Example of Using unordered_set

If you need to store a set of unique and non-repeating data, you can use unordered_set. Here is a simple example to check if a set of numbers contains a target number:

#include <iostream>
#include <unordered_set>

int main() {
    // Create an unordered set to store unique numbers
    std::unordered_set<int> numbers {10, 20, 30, 40};

    int targetNumber = 30;
    if (numbers.find(targetNumber) != numbers.end()) {
        std::cout << targetNumber << " is in the set." << std::endl;
    } else {
        std::cout << targetNumber << " is not in the set." << std::endl;
    }
    return 0;
}

Here, we created an unordered set to store four numbers and checked if the target number exists within it. If found, it outputs a confirmation message; otherwise, it outputs the opposite message.

3. Custom Hash Functions

For custom types, we can write our own hash functions. For example, in an employee class, we may want to generate a unique key based on the employee ID. First, define the employee class, then overload the hash template in the modern standard library:

#include <iostream>
#include <functional>
#include <iomanip>

class Employee {
public:
    int id;
    std::string name;
    Employee(int _id, const std::string &_name) : id(_id), name(_name) {}
};

// Custom hash function
namespace std {
template <>
struct hash<Employee> {
    size_t operator()(const Employee &employee) const noexcept {
        return hash<int>()(employee.id);
    }
};
}

// Main program to validate custom hash effect:
int main() {
    auto empHshr = std::hash<Employee>{};
    Employee e{101, "Robert Green"};
    size_t hsh = empHshr(e);
    std::cout << "The Hash value of " << e.name << " is " << hsh << std::endl;
    return 0;
}

In this example, we defined a new hashing mechanism for the custom type Employee, allowing us to efficiently index search using a custom ID.

4. Conclusion

This article introduced hash tables and related concepts in C++, including common operations and how to utilize methods provided by the C++ Standard Library to implement these functionalities. It also demonstrated how to extend custom types so that they can be used as keys. This knowledge not only enhances your programming skills but also helps you handle large amounts of data more effectively. In practical development, appropriately using these tools can significantly improve program performance.