The author has limited ability, and if there are any errors in the article, I would greatly appreciate it if friends could point them out. Thank you!
Concept of Union
In Chinese, a union is also referred to as a “共用体” (union), “联合” (joint), or “联合体” (joint body). Its definition format is the same as that of a struct, but its meaning is completely different. Below is the definition format of a union:
union union_name
{
member_list
} union_variable_name;
Since its definition format is the same as that of a struct, what is the difference? The following example illustrates the difference between a struct and a union through nesting:
struct my_struct
{
int type;
union my_union
{
char *str;
int number;
} value;
} Elem_t;
The access method is the same as that of a struct. For example, to access the number variable, you can do so as follows:
Elem_t.value.number = 10;
What is the difference between a union and a struct? In summary, the addresses of the members in a union are the same, while the members in a struct each have their own addresses. The following image shows the storage of Elem_t in memory.
After seeing the storage location of the variable in memory, the characteristics of a union become clear. The benefits of such storage are evident: the program can use different types of variables while occupying only one variable’s storage space, thus saving storage space. In the above program, the two members of the union occupy the same storage space of four bytes. If the storage space of the union’s members is different, the size of the union’s storage space depends on the largest member’s storage space.
Applications of Union
Using Union to Pack Data
When using a union to pack data, it is essential to know whether the current processor is big-endian or little-endian.
-
Big-endian: The low-order data is stored in the high memory address, and the high-order data is stored in the low memory address.
-
Little-endian: The low-order data is stored in the low memory address, and the high-order data is stored in the high memory address.
Below is an example illustrating the storage format in both big-endian and little-endian.
With an understanding of big-endian and little-endian, let’s see how a union can pack data. Below is a piece of code:
#include <stdio.h>
int main(void)
{
union
{
unsigned int word;
struct
{
unsigned char byte1;
unsigned char byte2;
} byte;
} u1;
u1.byte.byte1 = 0x21;
u1.byte.byte2 = 0x43;
printf("The Value of word is:0x%x\n", u1.word);
}
The output of the above code will vary depending on the alignment mode. If it is in little-endian mode:
The Value of word is:0x4321
If it is in big-endian mode:
The Value of word is:0x2143
Of course, using this method to pack data has its drawbacks, as it can yield different results due to the processor’s alignment mode. Therefore, we often use data shifting to achieve this:
uint8_t byte3 = 0x21;
uint8_t byte4 = 0x43;
uint16_t word;
word = (((uint16_t)byte4) << 8) | ((uint16_t)byte3);
The above method is not affected by the processor’s alignment mode and has better portability.
Union in Data Transmission
Background: There are two small cars that need to communicate, namely car A and car B. Sometimes, car A needs to send its current speed to car B, sometimes it needs to send its current position, and sometimes it needs to send its current status.
Analysis: In the above background, we know that the messages sent do not need to send speed, status, and position simultaneously; these three parameters are sent separately. At this point, we can use the characteristics of a union to construct a data structure. The benefit of this approach is that it reduces the memory occupied by variables. For example, if we do not use a union, we would typically use a struct like this:
struct buffer
{
uint8_t power; /* Current battery capacity */
uint8_t op_mode; /* Operating mode */
uint8_t temp; /* Current temperature */
uint16_t x_pos;
uint16_t y_pos;
uint16_t vel; /* Current speed of the car */
} my_buff;
Using the above struct, we can calculate (not considering memory alignment; memory alignment would require padding the struct’s memory, which I plan to write about in a separate article), the struct occupies 9 bytes of storage. To optimize our code, we can construct the data we want to transmit as follows:
union
{
struct
{
uint8_t power;
uint8_t op_mode;
uint8_t temp;
} status;
struct
{
uint16_t x_pos;
uint16_t y_pos;
} position;
uint16_t vel;
} msg_union;
In this way, the union occupies only 4 bytes of storage. If we want to encapsulate the data to be sent into a data frame, the above-defined union presents a problem because the receiver does not know which parameter was sent by the sender. Therefore, we need to add a parameter type variable, resulting in the following code:
struct
{
uint8_t msg_type;
union
{
struct
{
uint8_t power;
uint8_t op_mode;
uint8_t temp;
} status;
struct
{
uint16_t x_pos;
uint16_t y_pos;
} position;
uint16_t vel;
} msg_union;
} message;
With the addition of msg_type, we can parse the data on the receiving end.
Summary
Through the above example, we can review that if we do not use a union, when transmitting data, directly sending the data constructed by a struct results in a significantly larger data packet than using a union. Using a union not only helps us save storage space but also saves bandwidth during communication.
Union in Data Parsing
In the previous example, we optimized the code using a union in data transmission. What role does a union play in data parsing? Consider the following piece of code:
typedef union
{
uint8_t buffer[PACKET_SIZE];
struct
{
uint8_t size;
uint8_t CMD;
uint8_t payload[PAYLOAD_SIZE];
uint8_t crc;
} fields;
} PACKET_t;
// Function call method: packet_builder(packet.buffer, new_data)
// When storing new data in the buffer, some additional operations are needed
// For example, size should be stored in buffer[0]
// cmd should be stored in buffer[1], and so on
void packet_builder(uint8_t *buffer, uint8_t data)
{
static uint8_t received_bytes = 0;
buffer[received_bytes++] = data;
}
void packet_handler(PACKET_t *packet)
{
if (packet->fields.size > TOO_BIG)
{
// Error
}
if (packet->fields.cmd == CMD)
{
// Process corresponding data
}
}
To understand this data parsing process, we need to utilize the characteristic of a union where members share the same address. The elements in buffer[PACKET_SIZE] correspond one-to-one with the elements in fields. The following image illustrates this clearly:
Looking at this image, it becomes clear that after writing data into the buffer, we can directly read it from fields.
Conclusion
Effectively utilizing unions not only saves storage space but also leverages the address-sharing feature to achieve sophisticated effects. I had not used unions much before, but my recent study of unions has made me realize that the road ahead is long and arduous. However, I also quote a saying from Hu Shi: “What is there to fear about the infinite truth? Every inch of progress brings an inch of joy.”
References:
[1] https://www.allaboutcircuits.com/technical-articles/union-in-c-language-for-packing-and-unpacking-data/
[2] https://www.allaboutcircuits.com/technical-articles/learn-embedded-c-programming-language-understanding-union-data-object/.
[3] https://stackoverflow.com/questions/252552/why-do-we-need-c-unions.
Related Articles:
Do You Really Know Serial Communication? Check Out These Experiences
Several Very Useful Macro Techniques in C Language and Embedded Systems
Key Knowledge in C Language and Embedded Systems: Callback Functions
Summary of Essential Bit Manipulation Techniques in C Language and Embedded Systems
[Linux Notes] Device Tree Example Analysis
[Linux Notes] Easy-to-Understand Basics of Linux Drivers
[Linux Notes] Ping Experiment Between PC, Development Board, and Ubuntu
Experience Sharing on Learning STM32
Experiment Sharing on Smart Agriculture Based on LiteOS
Embedded Linux from the Perspective of a Microcontroller Engineer
Job Seeking Matters for Fresh Graduates
