【Problem】Given 4 billion unique unsigned int integers that are not sorted. Given a number, how can we quickly determine if this number exists among those 4 billion numbers?
1 billion is 9 zeros, so 4 billion is 4000000000. Our instinctive thought for a solution is to use a for loop to find the equal number, but this method is definitely not feasible. On a 32-bit machine, an unsigned int occupies 4 bytes, so 4 billion integers would require 4000000000 / (1024 * 1024 * 1024) = 14.90116119384765625, which is about 15 GB. Therefore, a conventional linear search cannot meet performance requirements.
Let’s review the basic knowledge of bit operations, bits, and bytes, as these are precious resources due to limited memory.
1 byte = 8 bits, 1 KB = 1024 bytes, 1 MB = 1024 KB
Using the above byte units for conversion, 4000000000 / (8 * 1024 * 1024) = 476.837158203125, which means we only need 476.84 MB of memory. For a 32-bit machine, an int occupies 4 bytes, and for the range of 0 to 31 bits, if it is an int array, as shown in the figure below:
The bits of A[0] are 0 to 31
The bits of A[1] are 32 to 63
The bits of A[2] are 64 to 95
The bits of A[3] are 96 to 127
And so on for A[n]……
One bit can represent one number. We assume the initial value is 0; if a certain bit is set to 1, it indicates that it represents a certain value. Now let’s simulate this: if there are 100 numbers, we need to search among these 100 numbers. 100 bits only require 12.5 bytes (100/8=12.5), which means we only need to define int[4] (since one int has 32 bits, 4 ints have 128 bits, which is just over 100).
The initial value of int[4] is 0. Let’s look at the program:
1. First, calculate which byte position the value is in, and which bit in that byte position corresponds to the array shown above, which bit in A[n].
int arr[4] = {0, 0, 0, 0};
void SetBit(int n)
{
int bytePos = (n >> 5); // Calculate byte position
int bitPos = (n & 0x1f); // Calculate which bit in the byte
arr[bytePos] |= (1 << (bitPos - 1)); // Set the logical bit to 1
}
2. Search if the corresponding bit value is 1; if it is, it indicates that the number is found, and return its value.
int SearchBit(int n)
{
int bytePos = (n >> 5); // Calculate byte position
int bitPos = (n & 0x1f); // Calculate which bit in the byte
return arr[bytePos] & (1 << (bitPos - 1)); // Check if this bit is 1 and return its value
}
3. Clear the bit to 0.
int ClearBit(int n)
{
int bytePos = (n >> 5); // Calculate byte position
int bitPos = (n & 0x1f); // Calculate which bit in the byte
arr[bytePos] &= ~(1 << (bitPos - 1)); // Clear the bit to 0
}
Main function, please refer to:
int main()
{
int a[100] = {1,10,20,30,40,50,60,70,80,90,
2,11,21,31,41,51,61,71,81,91,
3,12,22,32,42,52,62,72,82,92,
4,13,23,33,43,53,63,73,83,93,
5,14,24,34,44,54,64,74,84,94,
6,15,25,35,45,55,65,75,85,95,
7,16,26,36,46,56,66,76,86,96,
8,17,27,37,47,57,67,77,87,97,
9,18,28,38,48,58,68,78,88,98,
19,29,39,49,59,69,79,89,99,103};
for (int i = 0; i < 100; i++) SetBit(a[i]); // Set bit for each value
for (int i = 1; i <= 100; i++)
{
if (SearchBit(i)) printf("%d is exist\n", i);
else printf("%d is not exist\n", i);
}
for (int i = 0; i < 100; i++) ClearBit(a[i]);
return 0;
}
【Running Result】The running result shows that the number 100 does not exist; it is not in the array. This is a typical bitmap problem, allowing for fast data searching while saving memory.