C++ STL Set Algorithms: Roll It Up! Use These Algorithms to Elevate Your Code Click the blue text to follow the author

1. Overview of C++ STL Set Algorithms

The C++ Standard Template Library (STL) provides a rich set of algorithms that operate on sorted ranges. These algorithms are located in the <algorithm> header file. They can perform set operations such as union, intersection, difference, and symmetric difference. Using these algorithms can avoid manually writing complex loop logic, improving code readability and efficiency.

Prerequisite: Sorted Ranges.

Very Important: The STL set algorithms require the input ranges to be sorted. If the input ranges are not sorted, the behavior of the algorithms will be undefined and may produce incorrect results. Therefore, ensure that the ranges are sorted according to the specified sorting rules before using the set algorithms. You can use the std::sort algorithm to sort the ranges.

2. std::includes Algorithm

std::includes algorithm is used to check if one sorted range contains another sorted range. In other words, it determines whether all elements in [first2, last2) also exist in [first1, last1) and in the same order.

Prototype:

template <class InputIterator1, class InputIterator2>
bool includes (InputIterator1 first1, InputIterator1 last1,
               InputIterator2 first2, InputIterator2 last2);

template <class InputIterator1, class InputIterator2, class Compare>
bool includes (InputIterator1 first1, InputIterator1 last1,
               InputIterator2 first2, InputIterator2 last2,
               Compare comp);

Return Value: If [first1, last1) contains [first2, last2), it returns true; otherwise, it returns false. If [first2, last2) is an empty range, it always returns true (an empty range is contained in any range).

Time Complexity: O(N + M), where N is the length of [first1, last1) and M is the length of [first2, last2).

std::includes algorithm’s typical implementation is similar to the merge step of merge sort, but it does not actually merge the two ranges; it merely compares them.

Pseudocode:

function includes(first1, last1, first2, last2, comp):
    while first1 != last1 and first2 != last2:
        if comp(*first2, *first1):  // *first2 < *first1
            return false  // Element in first2 does not exist in first1
        else if comp(*first1, *first2): // *first1 < *first2
            first1++  // Continue searching in first1 for *first2
        else:  // *first1 == *first2
            first1++
            first2++  // Match found, move to the next element
    return first2 == last2  // If first2 reaches the end, it means first1 contains first2

Example:

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>

int main() {
    std::vector<int> main_set = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    std::vector<int> subset1 = {2, 4, 6};
    std::vector<int> subset2 = {2, 4, 11}; // 11 does not exist in main_set
    std::vector<int> empty_set = {};

    std::cout << std::boolalpha;

    std::cout << "main_set includes subset1: " << std::includes(main_set.begin(), main_set.end(), subset1.begin(), subset1.end()) << std::endl; // true
    std::cout << "main_set includes subset2: " << std::includes(main_set.begin(), main_set.end(), subset2.begin(), subset2.end()) << std::endl; // false
    std::cout << "main_set includes empty_set: " << std::includes(main_set.begin(), main_set.end(), empty_set.begin(), empty_set.end()) << std::endl; // true

    // Using custom comparison function (strings, case insensitive)
    std::vector<std::string> main_string_set = {"apple", "banana", "cherry", "date"};
    std::vector<std::string> string_subset = {"Banana", "Date"};

    auto case_insensitive_compare = [](const std::string& a, const std::string& b) {
        std::string lower_a = a;
        std::transform(lower_a.begin(), lower_a.end(), lower_a.begin(), ::tolower);
        std::string lower_b = b;
        std::transform(lower_b.begin(), lower_b.end(), lower_b.begin(), ::tolower);
        return lower_a < lower_b;
    };

    std::cout << "main_string_set includes string_subset (case-insensitive): "
              << std::includes(main_string_set.begin(), main_string_set.end(), string_subset.begin(), string_subset.end(), case_insensitive_compare)
              << std::endl; // true

    return 0;
}

Note:

std::includes strongly depends on the input ranges [first1, last1) and [first2, last2) being sorted according to the same sorting rules. If the ranges are not sorted, or the sorting rules are inconsistent, the results will be unpredictable. Use std::sort to ensure the ranges are sorted.
If a custom comparison function comp is used, the comparison function for sorting and for includes must be exactly the same. Inconsistent comparison functions will lead to incorrect results.
If the input ranges contain duplicate equal elements, std::includes can still work correctly. The algorithm checks whether each element in [first2, last2) appears in [first1, last1) at least the same number of times.

Alternatives:

If the input ranges are not sorted, you can use loops and std::find or std::any_of to check if one range contains another. However, this method is less efficient, with a time complexity of O(N * M).
If you only need to know whether any element in [first2, last2) exists in [first1, last1), you can use <code>std::any_of and std::binary_search (provided that [first1, last1) is sorted).

3. set_union Algorithm

std::set_union algorithm is used to compute the union of two sorted ranges. The union contains all unique elements from both ranges and is sorted according to the sorting rules. If an element appears in both ranges, it appears only once in the result range.

Prototype:

template <class InputIterator1, class InputIterator2, class OutputIterator>
OutputIterator set_union (InputIterator1 first1, InputIterator1 last1,
                           InputIterator2 first2, InputIterator2 last2,
                           OutputIterator result);

template <class InputIterator1, class InputIterator2, class OutputIterator, class Compare>
OutputIterator set_union (InputIterator1 first1, InputIterator1 last1,
                           InputIterator2 first2, InputIterator2 last2,
                           OutputIterator result, Compare comp);

result points to the beginning iterator of the target range where the union result will be stored. The target range must have enough space to accommodate all elements of the union. Typically, std::back_inserter is used to automatically expand the size of the target container.

Time Complexity: O(N + M), where N is the length of [first1, last1) and M is the length of [first2, last2).

Implementation Principle (Pseudocode):std::set_union algorithm’s typical implementation is similar to the merge step of merge sort.

function set_union(first1, last1, first2, last2, result, comp):
    while first1 != last1 and first2 != last2:
        if comp(*first1, *first2):  // *first1 < *first2
            *result = *first1;
            first1++;
            result++;
        else if comp(*first2, *first1): // *first2 < *first1
            *result = *first2;
            first2++;
            result++;
        else:  // *first1 == *first2
            *result = *first1;  // or *result = *first2; result is the same
            first1++;
            first2++;
            result++;

    // Copy remaining elements to the result range
    while first1 != last1:
        *result = *first1;
        first1++;
        result++;

    while first2 != last2:
        *result = *first2;
        first2++;
        result++;

    return result;

Usage Example:

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>

int main() {
    std::vector<int> v1 = {1, 2, 3, 4, 5};
    std::vector<int> v2 = {3, 5, 6, 7, 8};
    std::vector<int> result;

    std::set_union(v1.begin(), v1.end(), v2.begin(), v2.end(), std::back_inserter(result));

    std::cout << "Union: ";
    for (int x : result) {
        std::cout << x << " ";
    }
    std::cout << std::endl; // Output: Union: 1 2 3 4 5 6 7 8

    // Using custom comparison function (strings, case insensitive)
    std::vector<std::string> v3 = {"apple", "banana", "cherry"};
    std::vector<std::string> v4 = {"Banana", "date", "fig"};
    std::vector<std::string> result2;

    auto case_insensitive_compare = [](const std::string& a, const std::string& b) {
        std::string lower_a = a;
        std::transform(lower_a.begin(), lower_a.end(), lower_a.begin(), ::tolower);
        std::string lower_b = b;
        std::transform(lower_b.begin(), lower_b.end(), lower_b.begin(), ::tolower);
        return lower_a < lower_b;
    };

    std::sort(v3.begin(), v3.end(), case_insensitive_compare);
    std::sort(v4.begin(), v4.end(), case_insensitive_compare);

    std::set_union(v3.begin(), v3.end(), v4.begin(), v4.end(), std::back_inserter(result2), case_insensitive_compare);

    std::cout << "Union (Case-insensitive): ";
    for (const std::string& s : result2) {
        std::cout << s << " ";
    }
    std::cout << std::endl; // Output: Union (Case-insensitive): apple banana cherry date fig

    return 0;
}

Note:

std::set_union must be applied to sorted ranges. If the input ranges are not sorted, the results will be unpredictable. Use std::sort to ensure the ranges are sorted and use the same comparison function as set_union.
If a custom comparison function comp is used, the comparison function for sorting and for set_union must be exactly the same.
If the input ranges contain duplicate equivalent elements, std::set_union will keep one copy. The algorithm ensures that the elements in the result range are unique but does not eliminate duplicate elements in the input ranges.

Alternatives:

For unsorted ranges, you can use std::unordered_set or std::unordered_map to compute the union. However, this method is less efficient, with a time complexity of O(N + M), but has a higher constant factor, and the result range is unordered.
If you need to compute the union of multiple ranges, you can repeatedly use the std::set_union algorithm or use more advanced data structures and algorithms, such as divide and conquer.

4. set_intersection Algorithm

std::set_intersection algorithm is used to compute the intersection of two sorted ranges. The intersection contains all unique elements that exist in both ranges and is sorted according to the sorting rules.

Prototype:

template <class InputIterator1, class InputIterator2, class OutputIterator>
OutputIterator set_intersection (InputIterator1 first1, InputIterator1 last1,
                                 InputIterator2 first2, InputIterator2 last2,
                                 OutputIterator result);

template <class InputIterator1, class InputIterator2, class OutputIterator, class Compare>
OutputIterator set_intersection (InputIterator1 first1, InputIterator1 last1,
                                 InputIterator2 first2, InputIterator2 last2,
                                 OutputIterator result, Compare comp);

Implementation Principle:std::set_intersection algorithm’s typical implementation is also similar to the merge step of merge sort.

Usage Example:

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>

int main() {
    std::vector<int> v1 = {1, 2, 3, 4, 5};
    std::vector<int> v2 = {3, 5, 6, 7, 8};
    std::vector<int> result;

    std::set_intersection(v1.begin(), v1.end(), v2.begin(), v2.end(), std::back_inserter(result));

    std::cout << "Intersection: ";
    for (int x : result) {
        std::cout << x << " ";
    }
    std::cout << std::endl; // Output: Intersection: 3 5

    // Using custom comparison function (strings, case insensitive)
    std::vector<std::string> v3 = {"apple", "banana", "cherry"};
    std::vector<std::string> v4 = {"Banana", "date", "cherry"};
    std::vector<std::string> result2;

    auto case_insensitive_compare = [](const std::string& a, const std::string& b) {
        std::string lower_a = a;
        std::transform(lower_a.begin(), lower_a.end(), lower_a.begin(), ::tolower);
        std::string lower_b = b;
        std::transform(lower_b.begin(), lower_b.end(), lower_b.begin(), ::tolower);
        return lower_a < lower_b;
    };

    std::sort(v3.begin(), v3.end(), case_insensitive_compare);
    std::sort(v4.begin(), v4.end(), case_insensitive_compare);

    std::set_intersection(v3.begin(), v3.end(), v4.begin(), v4.end(), std::back_inserter(result2), case_insensitive_compare);

    std::cout << "Intersection (Case-insensitive): ";
    for (const std::string& s : result2) {
        std::cout << s << " ";
    }
    std::cout << std::endl; // Output: Intersection (Case-insensitive): banana cherry

    return 0;
}

5. set_difference Algorithm

std::set_difference algorithm is used to compute the difference of two sorted ranges. The difference contains all unique elements that exist in the first range but not in the second range, and is sorted according to the sorting rules. In other words, the elements in the result range are unique to the first range.

Prototype:

template <class InputIterator1, class InputIterator2, class OutputIterator>
OutputIterator set_difference (InputIterator1 first1, InputIterator1 last1,
                                InputIterator2 first2, InputIterator2 last2,
                                OutputIterator result);

template <class InputIterator1, class InputIterator2, class OutputIterator, class Compare>
OutputIterator set_difference (InputIterator1 first1, InputIterator1 last1,
                                InputIterator2 first2, InputIterator2 last2,
                                OutputIterator result, Compare comp);

set_difference computes the difference of [first1, last1) relative to [first2, last2). If you want to compute the difference of <code>[first2, last2) relative to <code>[first1, last1), you need to swap the parameters of these two ranges.

Usage Example:

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>

int main() {
    std::vector<int> v1 = {1, 2, 3, 4, 5};
    std::vector<int> v2 = {3, 5, 6, 7, 8};
    std::vector<int> result;

    std::set_difference(v1.begin(), v1.end(), v2.begin(), v2.end(), std::back_inserter(result));

    std::cout << "Difference (v1 - v2): ";
    for (int x : result) {
        std::cout << x << " ";
    }
    std::cout << std::endl; // Output: Difference (v1 - v2): 1 2 4

    // Compute v2 - v1
    result.clear(); // Clear result vector
    std::set_difference(v2.begin(), v2.end(), v1.begin(), v1.end(), std::back_inserter(result));

    std::cout << "Difference (v2 - v1): ";
    for (int x : result) {
        std::cout << x << " ";
    }
    std::cout << std::endl; // Output: Difference (v2 - v1): 6 7 8

    // Using custom comparison function (strings, case insensitive)
    std::vector<std::string> v3 = {"apple", "banana", "cherry"};
    std::vector<std::string> v4 = {"Banana", "date", "fig"};
    std::vector<std::string> result2;

    auto case_insensitive_compare = [](const std::string& a, const std::string& b) {
        std::string lower_a = a;
        std::transform(lower_a.begin(), lower_a.end(), lower_a.begin(), ::tolower);
        std::string lower_b = b;
        std::transform(lower_b.begin(), lower_b.end(), lower_b.begin(), ::tolower);
        return lower_a < lower_b;
    };

    std::sort(v3.begin(), v3.end(), case_insensitive_compare);
    std::sort(v4.begin(), v4.end(), case_insensitive_compare);

    std::set_difference(v3.begin(), v3.end(), v4.begin(), v4.end(), std::back_inserter(result2), case_insensitive_compare);

    std::cout << "Difference (v3 - v4, Case-insensitive): ";
    for (const std::string& s : result2) {
        std::cout << s << " ";
    }
    std::cout << std::endl; // Output: Difference (v3 - v4, Case-insensitive): apple cherry

    return 0;
}

Alternatives:

For unsorted ranges, you can use loops and search operations to compute the difference. However, this method is less efficient, with a time complexity of O(N * M).
You can use std::unordered_set or std::unordered_map to compute the difference of unsorted ranges. This method has an average time complexity of O(N + M), but requires additional memory to store the hash table. Note that the result of unordered_set will be unordered.

6. set_symmetric_difference Algorithm

std::set_symmetric_difference algorithm is used to compute the symmetric difference of two sorted ranges (A Δ B), which is (A – B) ∪ (B – A). The symmetric difference contains all unique elements that exist in one of the ranges but not in both, and is sorted according to the sorting rules. In other words, the elements in the result range are those that belong only to A or only to B, but not those that belong to both A and B.

Prototype:

template <class InputIterator1, class InputIterator2, class OutputIterator>
OutputIterator set_symmetric_difference (InputIterator1 first1, InputIterator1 last1,
                                         InputIterator2 first2, InputIterator2 last2,
                                         OutputIterator result);

template <class InputIterator1, class InputIterator2, class OutputIterator, class Compare>
OutputIterator set_symmetric_difference (InputIterator1 first1, InputIterator1 last1,
                                         InputIterator2 first2, InputIterator2 last2,
                                         OutputIterator result, Compare comp);

Unlike std::set_difference, the order of symmetric difference operations does not matter. The result of set_symmetric_difference(A, B) is the same as set_symmetric_difference(B, A).

Usage Example:

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>

int main() {
    std::vector<int> v1 = {1, 2, 3, 4, 5};
    std::vector<int> v2 = {3, 5, 6, 7, 8};
    std::vector<int> result;

    std::set_symmetric_difference(v1.begin(), v1.end(), v2.begin(), v2.end(), std::back_inserter(result));

    std::cout << "Symmetric Difference: ";
    for (int x : result) {
        std::cout << x << " ";
    }
    std::cout << std::endl; // Output: Symmetric Difference: 1 2 4 6 7 8

    // Using custom comparison function (strings, case insensitive)
    std::vector<std::string> v3 = {"apple", "banana", "cherry"};
    std::vector<std::string> v4 = {"Banana", "date", "fig"};
    std::vector<std::string> result2;

    auto case_insensitive_compare = [](const std::string& a, const std::string& b) {
        std::string lower_a = a;
        std::transform(lower_a.begin(), lower_a.end(), lower_a.begin(), ::tolower);
        std::string lower_b = b;
        std::transform(lower_b.begin(), lower_b.end(), lower_b.begin(), ::tolower);
        return lower_a < lower_b;
    };

    std::sort(v3.begin(), v3.end(), case_insensitive_compare);
    std::sort(v4.begin(), v4.end(), case_insensitive_compare);

    std::set_symmetric_difference(v3.begin(), v3.end(), v4.begin(), v4.end(), std::back_inserter(result2), case_insensitive_compare);

    std::cout << "Symmetric Difference (Case-insensitive): ";
    for (const std::string& s : result2) {
        std::cout << s << " ";
    }
    std::cout << std::endl; // Output: Symmetric Difference (Case-insensitive): apple cherry date fig

    return 0;
}

Alternatives:

For unsorted ranges, you can use loops and search operations to compute the symmetric difference. However, this method is less efficient, with a time complexity of O(N * M).
You can use std::unordered_set or std::unordered_map to compute the symmetric difference of unsorted ranges. This method has an average time complexity of O(N + M), but requires additional memory to store the hash table. Note that the result of unordered_set will be unordered.

7. Conclusion

The C++ STL set algorithms provide an efficient and easy-to-use way to perform set operations. By understanding the principles and usage of these algorithms, you can write cleaner, more efficient, and maintainable code.

Remember, these algorithms require the input ranges to be sorted, and you need to provide a sufficiently large target range to store the results, or use std::back_inserter to dynamically expand the target container.

The time complexity of STL set algorithms is typically O(N + M), where N and M are the lengths of the input ranges. Since the algorithms need to read each element of the input ranges, they are usually I/O intensive. For very large datasets, consider using parallel algorithms to improve performance.

Previous reviews of algorithms

C++ STL Search Algorithms: One line of code replaces 10 lines of loops, try it if you don’t believe it!

Arrays

Interviewer: Merge two sorted arrays, what are your thoughts? I: …

C++

STL Sorting Algorithms merge, sort, shuffle, reverse

Building

CMake from scratch: Write your first CMakeLists.txt

C++ STL Set Algorithms: Roll It Up! Use These Algorithms to Elevate Your Code Lion RyanWelcome to follow my public account Learn technology or submit articles

1. Overview of C++ STL Set Algorithms

2. std::includes Algorithm

3. set_union Algorithm

4. set_intersection Algorithm

5. set_difference Algorithm

6. set_symmetric_difference Algorithm

7. Conclusion

Related posts

Leave a Comment Cancel reply