
Click the blue text to follow the author
1. Overview of C++ STL Set Algorithms
The C++ Standard Template Library (STL) provides a rich set of algorithms that operate on sorted ranges. These algorithms are located in the <span><algorithm></span> header file. They can perform set operations such as union, intersection, difference, and symmetric difference. Using these algorithms can avoid manually writing complex loop logic, improving code readability and efficiency.
Prerequisite: Sorted Ranges.
Very Important: The STL set algorithms require the input ranges to be sorted. If the input ranges are not sorted, the behavior of the algorithms will be undefined and may produce incorrect results. Therefore, ensure that the ranges are sorted according to the specified sorting rules before using the set algorithms. You can use the <span>std::sort</span> algorithm to sort the ranges.
2. std::includes Algorithm
<span>std::includes</span> algorithm is used to check if one sorted range contains another sorted range. In other words, it determines whether all elements in <span>[first2, last2)</span> also exist in <span>[first1, last1)</span> and in the same order.
Prototype:
template <class InputIterator1, class InputIterator2>
bool includes (InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2, InputIterator2 last2);
template <class InputIterator1, class InputIterator2, class Compare>
bool includes (InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2, InputIterator2 last2,
Compare comp);
Return Value: If <span>[first1, last1)</span> contains <span>[first2, last2)</span>, it returns <span>true</span>; otherwise, it returns <span>false</span>. If <span>[first2, last2)</span> is an empty range, it always returns <span>true</span> (an empty range is contained in any range).
Time Complexity: O(N + M), where N is the length of <span>[first1, last1)</span> and M is the length of <span>[first2, last2)</span>.
<span>std::includes</span> algorithm’s typical implementation is similar to the merge step of merge sort, but it does not actually merge the two ranges; it merely compares them.
Pseudocode:
function includes(first1, last1, first2, last2, comp):
while first1 != last1 and first2 != last2:
if comp(*first2, *first1): // *first2 < *first1
return false // Element in first2 does not exist in first1
else if comp(*first1, *first2): // *first1 < *first2
first1++ // Continue searching in first1 for *first2
else: // *first1 == *first2
first1++
first2++ // Match found, move to the next element
return first2 == last2 // If first2 reaches the end, it means first1 contains first2
Example:
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
int main() {
std::vector<int> main_set = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
std::vector<int> subset1 = {2, 4, 6};
std::vector<int> subset2 = {2, 4, 11}; // 11 does not exist in main_set
std::vector<int> empty_set = {};
std::cout << std::boolalpha;
std::cout << "main_set includes subset1: " << std::includes(main_set.begin(), main_set.end(), subset1.begin(), subset1.end()) << std::endl; // true
std::cout << "main_set includes subset2: " << std::includes(main_set.begin(), main_set.end(), subset2.begin(), subset2.end()) << std::endl; // false
std::cout << "main_set includes empty_set: " << std::includes(main_set.begin(), main_set.end(), empty_set.begin(), empty_set.end()) << std::endl; // true
// Using custom comparison function (strings, case insensitive)
std::vector<std::string> main_string_set = {"apple", "banana", "cherry", "date"};
std::vector<std::string> string_subset = {"Banana", "Date"};
auto case_insensitive_compare = [](const std::string& a, const std::string& b) {
std::string lower_a = a;
std::transform(lower_a.begin(), lower_a.end(), lower_a.begin(), ::tolower);
std::string lower_b = b;
std::transform(lower_b.begin(), lower_b.end(), lower_b.begin(), ::tolower);
return lower_a < lower_b;
};
std::cout << "main_string_set includes string_subset (case-insensitive): "
<< std::includes(main_string_set.begin(), main_string_set.end(), string_subset.begin(), string_subset.end(), case_insensitive_compare)
<< std::endl; // true
return 0;
}
Note:
<span>std::includes</span>strongly depends on the input ranges<span>[first1, last1)</span>and<span>[first2, last2)</span>being sorted according to the same sorting rules. If the ranges are not sorted, or the sorting rules are inconsistent, the results will be unpredictable. Use<span>std::sort</span>to ensure the ranges are sorted.- If a custom comparison function
<span>comp</span>is used, the comparison function for sorting and for<span>includes</span>must be exactly the same. Inconsistent comparison functions will lead to incorrect results. - If the input ranges contain duplicate equal elements,
<span>std::includes</span>can still work correctly. The algorithm checks whether each element in<span>[first2, last2)</span>appears in<span>[first1, last1)</span><span> at least the same number of times.</span>
Alternatives:
- If the input ranges are not sorted, you can use loops and
<span>std::find</span>or<span>std::any_of</span>to check if one range contains another. However, this method is less efficient, with a time complexity of O(N * M). - If you only need to know whether any element in
<span>[first2, last2)</span>exists in<span>[first1, last1)</span><span>, you can use </span><code><span>std::any_of</span>and<span>std::binary_search</span>(provided that<span>[first1, last1)</span><span> is sorted).</span>
3. set_union Algorithm
<span>std::set_union</span> algorithm is used to compute the union of two sorted ranges. The union contains all unique elements from both ranges and is sorted according to the sorting rules. If an element appears in both ranges, it appears only once in the result range.
Prototype:
template <class InputIterator1, class InputIterator2, class OutputIterator>
OutputIterator set_union (InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2, InputIterator2 last2,
OutputIterator result);
template <class InputIterator1, class InputIterator2, class OutputIterator, class Compare>
OutputIterator set_union (InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2, InputIterator2 last2,
OutputIterator result, Compare comp);
<span>result</span> points to the beginning iterator of the target range where the union result will be stored. The target range must have enough space to accommodate all elements of the union. Typically, <span>std::back_inserter</span> is used to automatically expand the size of the target container.
Time Complexity: O(N + M), where N is the length of <span>[first1, last1)</span> and M is the length of <span>[first2, last2)</span><span>.</span>
Implementation Principle (Pseudocode):<span>std::set_union</span> algorithm’s typical implementation is similar to the merge step of merge sort.
function set_union(first1, last1, first2, last2, result, comp):
while first1 != last1 and first2 != last2:
if comp(*first1, *first2): // *first1 < *first2
*result = *first1;
first1++;
result++;
else if comp(*first2, *first1): // *first2 < *first1
*result = *first2;
first2++;
result++;
else: // *first1 == *first2
*result = *first1; // or *result = *first2; result is the same
first1++;
first2++;
result++;
// Copy remaining elements to the result range
while first1 != last1:
*result = *first1;
first1++;
result++;
while first2 != last2:
*result = *first2;
first2++;
result++;
return result;
Usage Example:
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
int main() {
std::vector<int> v1 = {1, 2, 3, 4, 5};
std::vector<int> v2 = {3, 5, 6, 7, 8};
std::vector<int> result;
std::set_union(v1.begin(), v1.end(), v2.begin(), v2.end(), std::back_inserter(result));
std::cout << "Union: ";
for (int x : result) {
std::cout << x << " ";
}
std::cout << std::endl; // Output: Union: 1 2 3 4 5 6 7 8
// Using custom comparison function (strings, case insensitive)
std::vector<std::string> v3 = {"apple", "banana", "cherry"};
std::vector<std::string> v4 = {"Banana", "date", "fig"};
std::vector<std::string> result2;
auto case_insensitive_compare = [](const std::string& a, const std::string& b) {
std::string lower_a = a;
std::transform(lower_a.begin(), lower_a.end(), lower_a.begin(), ::tolower);
std::string lower_b = b;
std::transform(lower_b.begin(), lower_b.end(), lower_b.begin(), ::tolower);
return lower_a < lower_b;
};
std::sort(v3.begin(), v3.end(), case_insensitive_compare);
std::sort(v4.begin(), v4.end(), case_insensitive_compare);
std::set_union(v3.begin(), v3.end(), v4.begin(), v4.end(), std::back_inserter(result2), case_insensitive_compare);
std::cout << "Union (Case-insensitive): ";
for (const std::string& s : result2) {
std::cout << s << " ";
}
std::cout << std::endl; // Output: Union (Case-insensitive): apple banana cherry date fig
return 0;
}
Note:
<span>std::set_union</span>must be applied to sorted ranges. If the input ranges are not sorted, the results will be unpredictable. Use<span>std::sort</span>to ensure the ranges are sorted and use the same comparison function as<span>set_union</span>.- If a custom comparison function
<span>comp</span>is used, the comparison function for sorting and for<span>set_union</span>must be exactly the same. - If the input ranges contain duplicate equivalent elements,
<span>std::set_union</span>will keep one copy. The algorithm ensures that the elements in the result range are unique but does not eliminate duplicate elements in the input ranges.
Alternatives:
- For unsorted ranges, you can use
<span>std::unordered_set</span>or<span>std::unordered_map</span>to compute the union. However, this method is less efficient, with a time complexity of O(N + M), but has a higher constant factor, and the result range is unordered. - If you need to compute the union of multiple ranges, you can repeatedly use the
<span>std::set_union</span>algorithm or use more advanced data structures and algorithms, such as divide and conquer.
4. set_intersection Algorithm
<span>std::set_intersection</span> algorithm is used to compute the intersection of two sorted ranges. The intersection contains all unique elements that exist in both ranges and is sorted according to the sorting rules.
Prototype:
template <class InputIterator1, class InputIterator2, class OutputIterator>
OutputIterator set_intersection (InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2, InputIterator2 last2,
OutputIterator result);
template <class InputIterator1, class InputIterator2, class OutputIterator, class Compare>
OutputIterator set_intersection (InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2, InputIterator2 last2,
OutputIterator result, Compare comp);
Implementation Principle:<span>std::set_intersection</span> algorithm’s typical implementation is also similar to the merge step of merge sort.
Usage Example:
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
int main() {
std::vector<int> v1 = {1, 2, 3, 4, 5};
std::vector<int> v2 = {3, 5, 6, 7, 8};
std::vector<int> result;
std::set_intersection(v1.begin(), v1.end(), v2.begin(), v2.end(), std::back_inserter(result));
std::cout << "Intersection: ";
for (int x : result) {
std::cout << x << " ";
}
std::cout << std::endl; // Output: Intersection: 3 5
// Using custom comparison function (strings, case insensitive)
std::vector<std::string> v3 = {"apple", "banana", "cherry"};
std::vector<std::string> v4 = {"Banana", "date", "cherry"};
std::vector<std::string> result2;
auto case_insensitive_compare = [](const std::string& a, const std::string& b) {
std::string lower_a = a;
std::transform(lower_a.begin(), lower_a.end(), lower_a.begin(), ::tolower);
std::string lower_b = b;
std::transform(lower_b.begin(), lower_b.end(), lower_b.begin(), ::tolower);
return lower_a < lower_b;
};
std::sort(v3.begin(), v3.end(), case_insensitive_compare);
std::sort(v4.begin(), v4.end(), case_insensitive_compare);
std::set_intersection(v3.begin(), v3.end(), v4.begin(), v4.end(), std::back_inserter(result2), case_insensitive_compare);
std::cout << "Intersection (Case-insensitive): ";
for (const std::string& s : result2) {
std::cout << s << " ";
}
std::cout << std::endl; // Output: Intersection (Case-insensitive): banana cherry
return 0;
}
5. set_difference Algorithm
<span>std::set_difference</span> algorithm is used to compute the difference of two sorted ranges. The difference contains all unique elements that exist in the first range but not in the second range, and is sorted according to the sorting rules. In other words, the elements in the result range are unique to the first range.
Prototype:
template <class InputIterator1, class InputIterator2, class OutputIterator>
OutputIterator set_difference (InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2, InputIterator2 last2,
OutputIterator result);
template <class InputIterator1, class InputIterator2, class OutputIterator, class Compare>
OutputIterator set_difference (InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2, InputIterator2 last2,
OutputIterator result, Compare comp);
<span>set_difference</span> computes the difference of <span>[first1, last1)</span> relative to <span>[first2, last2)</span><span>. If you want to compute the difference of </span><code><span>[first2, last2)</span><span> relative to </span><code><span>[first1, last1)</span><span>, you need to swap the parameters of these two ranges.</span>
Usage Example:
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
int main() {
std::vector<int> v1 = {1, 2, 3, 4, 5};
std::vector<int> v2 = {3, 5, 6, 7, 8};
std::vector<int> result;
std::set_difference(v1.begin(), v1.end(), v2.begin(), v2.end(), std::back_inserter(result));
std::cout << "Difference (v1 - v2): ";
for (int x : result) {
std::cout << x << " ";
}
std::cout << std::endl; // Output: Difference (v1 - v2): 1 2 4
// Compute v2 - v1
result.clear(); // Clear result vector
std::set_difference(v2.begin(), v2.end(), v1.begin(), v1.end(), std::back_inserter(result));
std::cout << "Difference (v2 - v1): ";
for (int x : result) {
std::cout << x << " ";
}
std::cout << std::endl; // Output: Difference (v2 - v1): 6 7 8
// Using custom comparison function (strings, case insensitive)
std::vector<std::string> v3 = {"apple", "banana", "cherry"};
std::vector<std::string> v4 = {"Banana", "date", "fig"};
std::vector<std::string> result2;
auto case_insensitive_compare = [](const std::string& a, const std::string& b) {
std::string lower_a = a;
std::transform(lower_a.begin(), lower_a.end(), lower_a.begin(), ::tolower);
std::string lower_b = b;
std::transform(lower_b.begin(), lower_b.end(), lower_b.begin(), ::tolower);
return lower_a < lower_b;
};
std::sort(v3.begin(), v3.end(), case_insensitive_compare);
std::sort(v4.begin(), v4.end(), case_insensitive_compare);
std::set_difference(v3.begin(), v3.end(), v4.begin(), v4.end(), std::back_inserter(result2), case_insensitive_compare);
std::cout << "Difference (v3 - v4, Case-insensitive): ";
for (const std::string& s : result2) {
std::cout << s << " ";
}
std::cout << std::endl; // Output: Difference (v3 - v4, Case-insensitive): apple cherry
return 0;
}
Alternatives:
- For unsorted ranges, you can use loops and search operations to compute the difference. However, this method is less efficient, with a time complexity of O(N * M).
- You can use
<span>std::unordered_set</span>or<span>std::unordered_map</span>to compute the difference of unsorted ranges. This method has an average time complexity of O(N + M), but requires additional memory to store the hash table. Note that the result of<span>unordered_set</span>will be unordered.
6. set_symmetric_difference Algorithm
<span>std::set_symmetric_difference</span> algorithm is used to compute the symmetric difference of two sorted ranges (A Δ B), which is (A – B) ∪ (B – A). The symmetric difference contains all unique elements that exist in one of the ranges but not in both, and is sorted according to the sorting rules. In other words, the elements in the result range are those that belong only to A or only to B, but not those that belong to both A and B.
Prototype:
template <class InputIterator1, class InputIterator2, class OutputIterator>
OutputIterator set_symmetric_difference (InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2, InputIterator2 last2,
OutputIterator result);
template <class InputIterator1, class InputIterator2, class OutputIterator, class Compare>
OutputIterator set_symmetric_difference (InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2, InputIterator2 last2,
OutputIterator result, Compare comp);
Unlike <span>std::set_difference</span>, the order of symmetric difference operations does not matter. The result of <span>set_symmetric_difference(A, B)</span> is the same as <span>set_symmetric_difference(B, A)</span>.
Usage Example:
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
int main() {
std::vector<int> v1 = {1, 2, 3, 4, 5};
std::vector<int> v2 = {3, 5, 6, 7, 8};
std::vector<int> result;
std::set_symmetric_difference(v1.begin(), v1.end(), v2.begin(), v2.end(), std::back_inserter(result));
std::cout << "Symmetric Difference: ";
for (int x : result) {
std::cout << x << " ";
}
std::cout << std::endl; // Output: Symmetric Difference: 1 2 4 6 7 8
// Using custom comparison function (strings, case insensitive)
std::vector<std::string> v3 = {"apple", "banana", "cherry"};
std::vector<std::string> v4 = {"Banana", "date", "fig"};
std::vector<std::string> result2;
auto case_insensitive_compare = [](const std::string& a, const std::string& b) {
std::string lower_a = a;
std::transform(lower_a.begin(), lower_a.end(), lower_a.begin(), ::tolower);
std::string lower_b = b;
std::transform(lower_b.begin(), lower_b.end(), lower_b.begin(), ::tolower);
return lower_a < lower_b;
};
std::sort(v3.begin(), v3.end(), case_insensitive_compare);
std::sort(v4.begin(), v4.end(), case_insensitive_compare);
std::set_symmetric_difference(v3.begin(), v3.end(), v4.begin(), v4.end(), std::back_inserter(result2), case_insensitive_compare);
std::cout << "Symmetric Difference (Case-insensitive): ";
for (const std::string& s : result2) {
std::cout << s << " ";
}
std::cout << std::endl; // Output: Symmetric Difference (Case-insensitive): apple cherry date fig
return 0;
}
Alternatives:
- For unsorted ranges, you can use loops and search operations to compute the symmetric difference. However, this method is less efficient, with a time complexity of O(N * M).
- You can use
<span>std::unordered_set</span>or<span>std::unordered_map</span>to compute the symmetric difference of unsorted ranges. This method has an average time complexity of O(N + M), but requires additional memory to store the hash table. Note that the result of<span>unordered_set</span>will be unordered.
7. Conclusion
The C++ STL set algorithms provide an efficient and easy-to-use way to perform set operations. By understanding the principles and usage of these algorithms, you can write cleaner, more efficient, and maintainable code.
Remember, these algorithms require the input ranges to be sorted, and you need to provide a sufficiently large target range to store the results, or use <span>std::back_inserter</span><span> to dynamically expand the target container.</span>
The time complexity of STL set algorithms is typically O(N + M), where N and M are the lengths of the input ranges. Since the algorithms need to read each element of the input ranges, they are usually I/O intensive. For very large datasets, consider using parallel algorithms to improve performance.
Previous reviews of algorithms
C++ STL Search Algorithms: One line of code replaces 10 lines of loops, try it if you don’t believe it!
Arrays
Interviewer: Merge two sorted arrays, what are your thoughts? I: …
C++
STL Sorting Algorithms merge, sort, shuffle, reverse
Building
CMake from scratch: Write your first CMakeLists.txt
Lion RyanWelcome to follow my public account Learn technology or submit articles