MQTT Packet Structure
In the MQTT protocol, an MQTT packet consists of three parts: Fixed Header, Variable Header, and Payload.
-
(1) Fixed Header. Present in all MQTT packets, indicating the packet type and group class identifier.
-
(2) Variable Header. Present in some MQTT packets, the packet type determines whether the variable header exists and its specific content.
-
(3) Payload. Present in some MQTT packets, indicating the actual content received by the client. Similar to the variable header, some protocol types have message content while others do not.
MQTT Fixed Header
The MQTT protocol has many types, such as CONNECT, PUBLISH, SUBSCRIBE, PING, etc. All types of MQTT protocols must include a Fixed Header.
The fixed header contains two parts: the first byte (Byte1) and the remaining message length (starting from Byte2, occupying up to 4 bytes).
Bit | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
byte 1 | MQTT Packet Type | Packet Type Flag | ||||||
byte 2.. | Remaining Length |
MQTT Control Packet Type
Bit[7-4] of Byte1: MQTT Control Packet Type, the packet type. A total of 16 packet types can be represented, where 0000 and 1111 are reserved fields.
Packet Type | Bit[7-4] Value | Data Direction | Description |
---|---|---|---|
Reserved | 0000 | Disabled | Reserved |
CONNECT | 0001 | Client —> Server | Client connects to the server |
CONNACK | 0010 | Server —> Client | Connection acknowledgment |
PUBLISH | 0011 | Client <–> Server | Publish message |
PUBACK | 0100 | Client <–> Server | Publish acknowledgment |
PUBREC | 0101 | Client <–> Server | Message received (QoS 2 Phase 1) |
PUBREL | 0110 | Client <–> Server | Message release (QoS 2 Phase 2) |
PUBCOMP | 0111 | Client <–> Server | Publish complete (QoS 2 Phase 3) |
SUBSCRIBE | 1000 | Client —> Server | Client subscription request |
SUBACK | 1001 | Server —> Client | Server subscription acknowledgment |
UNSUBSCRIBE | 1010 | Client —> Server | Client unsubscribe request |
UNSUBACK | 1011 | Server —> Client | Server unsubscribe acknowledgment |
PINGREQ | 1100 | Client —> Server | Client sends heartbeat |
PINGRESP | 1101 | Server —> Client | Server replies heartbeat |
DISCONNECT | 1110 | Client —> Server | Client disconnect request |
Reserved | 1111 | Disabled | Reserved |
Flags Specific to Each MQTT Control Packet Type
Bit[3-0] of Byte1: Flags specific to each MQTT Control Packet type, byte bits used as flags for certain packet types. In fact, only a few packet types have control bits, as shown in the following table.
Packet Type | Fixed Header Flag | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
---|---|---|---|---|---|
CONNECT | Reserved | 0 | 0 | 0 | 0 |
CONNACK | Reserved | 0 | 0 | 0 | 0 |
PUBLISH | Used in MQTT 3.1.1 | DUP | QoS | QoS | RETAIN |
PUBACK | Reserved | 0 | 0 | 0 | 0 |
PUBREC | Reserved | 0 | 0 | 0 | 0 |
PUBREL | Reserved | 0 | 0 | 1 | 0 |
PUBCOMP | Reserved | 0 | 0 | 0 | 0 |
SUBSCRIBE | Reserved | 0 | 0 | 1 | 0 |
SUBACK | Reserved | 0 | 0 | 0 | 0 |
UNSUBSCRIBE | Reserved | 0 | 0 | 1 | 0 |
UNSUBACK | Reserved | 0 | 0 | 0 | 0 |
PINGREQ | Reserved | 0 | 0 | 0 | 0 |
PINGRESP | Reserved | 0 | 0 | 0 | 0 |
DISCONNECT | Reserved | 0 | 0 | 0 | 0 |
I do not want to explain the significance and usage of the flags in the PUBLISH packet so quickly as it may confuse learning.
Remaining Length
The calculation of remaining length is a major difficulty in understanding. Pay attention to understanding the following two bold sentences.
Remaining Length means the remaining length, which is the <strong>Variable Header</strong> + <strong>Payload</strong>
length.
The remaining length starts from Byte 2 and can occupy up to 4 bytes. That is: the remaining length range is from Byte2 to Byte5.
Calculation: Bytes Occupied by Remaining Length
The MQTT protocol stipulates that if bit7 (the highest bit) of byte2 (up to byte5) is 1, it indicates that there are subsequent bytes.
Let N be the nth byte in the message, (2 < N < 5), (Byte 5’s bit7 must be 0) if byte N’s bit7 is 1, then Byte M (M = N + 1, M < 5) is considered part of
Remaining Length
and can be used to continue calculating byte length; if byte N’s bit7 is 0, then Byte M (M = N + 1, M < 5) cannot be considered part ofRemaining Length
for calculating byte length.
So the maximum value of a single byte is: 01111111, that is: 0x7F, which is 127 in decimal. The MQTT protocol allows a maximum of 4 bytes to represent the remaining length. Therefore, the maximum length is: 0xFF,0xFF,0xFF,0x7F.
Calculation: Length Represented by Remaining Length (in Bytes)
The message length can be simply understood as base-128 data, where 4 bytes can represent up to 128128128*128 Bytes = 256MB. Note: The calculation of length is somewhat special, with low bits in front and high bits behind.
The following is the length range of the message:
Bytes Occupied | Minimum Length Range | Maximum Length Range |
---|---|---|
1 | 0(0x00) | 127(0x7F) |
2 | 128 (0x80, 0x01) | 16 383 (0xFF, 0x7F) |
3 | 16 384 (0x80, 0x80, 0x01) | 2 097 151 (0xFF, 0xFF, 0x7F) |
4 | 2 097 152 (0x80, 0x80, 0x80, 0x01) | 268 435 455 (0xFF, 0xFF, 0xFF, 0x7F) |
Calculating Remaining Length
To help readers understand, let’s take an example and calculate.
If we receive an MQTT data packet:
0x20 0x02 0xAA 0xBB
, a total of 4 bytesAccording to the MQTT data structure,
0x20
representsCONNACK packet
.The second byte starts with the Remaining Length.
Clearly, Byte2 (0x02) has byte2[7] as 0, indicating that the following 0xAA 0xBB is unrelated to the remaining length.
Furthermore, Byte2 (0x02) byte2[6:0] value = 2, indicating that there are 2 subsequent message bytes, which are
0xAA 0xBB
.(We do not care about the part unrelated to the fixed header
0xAA 0xBB
at this moment; in fact, it is a random example I gave.)Thus, the fixed header calculation is complete.
This example is relatively simple; let’s look at a slightly more complex packet.
If we receive an MQTT data packet starting with
0x30 0x9B 0x01 ...
:According to the MQTT data structure,
0x30
representsPUBLISH packet
.The second byte starts with the Remaining Length.
Clearly, Byte 2 (0x9B) has bit7 as
1
, indicating that the subsequent0x01
is related to the remaining length.Moreover, Byte 3 (0x01) has bit7 as
0
, indicating that the remaining length-related messages end at this byte.Knowing that the bytes related to remaining length are
0x9B
and0x01
, we can calculate the specific remaining length.Note: The low bits are in front, and the high bits are behind.
In Byte 2, the 0x9B can only calculate length using byte2[6:0], that is
(0x9B)&~(0x80) = 0x1B
Thus:
len = (0x01)* 128 + 0x1B = 155
, meaning there are 155 bytes left in the message.We can also see through this example that the length of the message is actually stored in base-128.
Thus, the fixed header calculation is complete.
Following this pattern, we can easily determine if a packet starts with 0x20 0xFF 0xFF 0xFF 0x7E
, then the remaining length is 266338303 bytes.
len=(7E16)*1283+7F16)*1282+7F16)*1281+7F16)*1280=266338303
We can even write a piece of “remaining length calculation” C code.
c
/*
# Copyright from Web, All Rights Reserved
#
# File Name: endecode_for_rl.c
# Created : Mon, Feb 3, 2020 7:47:02 PM
*/
#include <stdio.h>
typedef unsigned int uint32;
typedef unsigned short uint16;
typedef unsigned char uint8;
/*
* buf stores the remaining length segment container
* length is the set length
* return value: the number of bytes occupied by buf
* */
int MQTTPacketSetPacketLenth(uint8 *buf, unsigned long length){
// ref : https://blog.csdn.net/weixin_42381351/article/details/89397776
unsigned long rc = 0;
unsigned char d;
do {
d = length % 128;
length /= 128;
/* if there are more digits to encode, set the top bit of this digit */
if (length > 0) {
d |= 0x80;
}
buf[rc++] = d;
} while (length > 0);
return rc;
}
/*
* buf as the starting address of the remaining length frame
* */
unsigned long MQTTPacketGetPacketLenth(uint8 *buf){
// Adapted from the pseudocode in the Chinese document
char encodedByte;
unsigned int multiplier = 1;
unsigned long rc = 0;
int i = 0;
do {
encodedByte = buf[i++];
rc += (encodedByte & 0x7f) * multiplier;
if (multiplier > 128*128*128)
break; //throw Error(Malformed Remaining Length)
else
multiplier *= 128;
}while ((encodedByte & 0x80) != 0);
return rc;
}
int main(void){
int i;
unsigned long rl;
int length_step;
uint8 packet[256] = {0x80, 0x80, 0x80, 0x01}; // Except for the remaining length, there are no other parts
rl = MQTTPacketGetPacketLenth(packet);
printf("Calculated length: %ld\n", rl);
length_step = MQTTPacketSetPacketLenth(packet, 16383);
rl = MQTTPacketGetPacketLenth(packet);
printf("Calculated length: %ld, should be 16383\n", rl);
length_step = MQTTPacketSetPacketLenth(packet, 2097151);
rl = MQTTPacketGetPacketLenth(packet);
printf("Calculated length: %ld, should be 2097151\n", rl);
length_step = MQTTPacketSetPacketLenth(packet, 268435455);
rl = MQTTPacketGetPacketLenth(packet);
printf("Calculated length: %ld, should be 268435455\n", rl);
length_step = MQTTPacketSetPacketLenth(packet, 321);
rl = MQTTPacketGetPacketLenth(packet);
printf("Calculated length: %ld, should be 321\n", rl);
return 0;
}
MQTT Variable Header
Variable Header means the variable message header. The MQTT
packet contains a variable header, which is located between the variable header and the payload.
Some packet types contain variable headers, such as PUBLISH, SUBSCRIBE, CONNECT, etc. The variable header is between the fixed header and the message content, and its content varies according to the packet type.
When learning the fixed header, we can analyze and calculate byte by byte, but I personally think that learning the variable header should be based on a complete analysis of specific packet types.
The variable header does not mean it is optional, but rather indicates that this part exists in some protocol types and does not exist in others. The content of the variable header varies according to the packet type and is commonly used as a packet identifier:
Bi | 7 6 5 4 3 2 1 0 |
---|---|
byte 1 | Packet Label (MSB) |
byte 2… | Packet Label (LSB) |
Using big-endian (high-order byte before low-order byte). This means a 16-bit word is represented on the network as the most significant byte (MSB) followed by the least significant byte (LSB).
The subsequent fields also use this encoding, and it needs to be emphasized:
For strings, MQTT uses a modified version of UTF-8 encoding, generally in the following form, which needs to be remembered:
bit 7 6 5 4 3 2 1 0 byte 1 String Length MSB byte 2 String Length LSB bytes 3 … Encoded Character Data The first two bytes (byte1 and byte2) form a complete unsigned 16-bit number, representing the length of the string bytes starting from byte3.
The next n bytes are the actual content of the string.
A total of 2+n bytes.
Packet Identifier of Variable Header
Packet Identifier can also be called Message Identifier, and the packet identifier mentioned later in the article refers to Packet Identifier.
The packet identifier is used to distinguish packets, especially in retransmitted packets to identify whether it is the same packet, and in scenarios requiring acknowledgment to determine which sent packet it is responding to. The packet identifier field of the variable header exists in multiple types of packets (occupying 2 bytes). These packets are: PUBLISH (when QoS > 0)
, PUBACK
, PUBREC
, PUBREL
, PUBCOMP
, SUBSCRIBE
, SUBACK
, UNSUBSCRIBE
, UNSUBACK
.
Actually, it is like this. In the MQTT protocol, some packets require a corresponding acknowledgment packet after being sent; to avoid confusion, the Packet Identifier is used to “bind” the processing of these messages. Without a Packet Identifier, it would be impossible to process multiple identical packets in communication. The sender would not know which message has been effectively processed and which has been rejected.
Bit | 7 – 0 |
---|---|
byte 1 | Packet Identifier MSB |
byte 2 | Packet Identifier LSB |
The Package ID starts from 1 (0x01) by default and increments, with a maximum of 255 (0xff).
SUBSCRIBE
, UNSUBSCRIBE
, and PUBLISH (QoS greater than 0)
control packets must contain a non-zero 16-bit packet identifier (Packet Identifier).
-
Each time a client sends a new packet of these types, it must allocate a currently unused packet identifier.
-
If a client wants to retransmit this special control packet, it must use the same identifier when retransmitting that packet.
Once the client has processed the acknowledgment (ACK, CMP) for this packet, the packet identifier is released for reuse.
For example: QoS 1’s PUBLISH corresponds to
PUBACK
, QoS 2’s PUBLISH corresponds toPUBCOMP
, and those corresponding to SUBSCRIBE or UNSUBSCRIBE areSUBACK
orUNSUBACK
, respectively.
When sending a QoS 0 PUBLISH packet, the same conditions apply to the server.
PUBLISH packets with QoS equal to 0 cannot contain a packet identifier.
PUBACK
, PUBREC
, PUBREL
packets must include the same packet identifier as the initially sent PUBLISH packet. Similarly, SUBACK
and UNSUBACK
must include the packet identifier used in the corresponding SUBSCRIBE and UNSUBSCRIBE packets.
The control packets that require packet identifiers are listed in the table – Control Packets that contain a Packet Identifier.
Control Packet | Packet Identifier Field |
---|---|
PUBLISH | YES (QoS > 0) |
PUBACK | YES |
PUBREC | YES |
PUBREL | YES |
PUBCOMP | YES |
SUBSCRIBE | YES |
SUBACK | YES |
UNSUBSCRIBE | YES |
UNSUBACK | YES |
Clients and servers independently allocate packet identifiers. Therefore, it is possible for a client and server to use the same packet identifier to achieve concurrent message exchange. In other words, if a client sends a PUBLISH packet with identifier 0x1234, it may receive another different PUBLISH packet from the server using the same packet identifier 0x1234 before receiving the PUBACK for that packet.
Client Server PUBLISH Packet Identifier=0x1234---> <--PUBLISH Packet Identifier=0x1234 PUBACK Packet Identifier=0x1234---> <--PUBACK Packet Identifier=0x1234
The above message shows that the client sends a message to the server using Packet ID 0x1234, and the server sends a message to the client also using Packet ID 0x1234. Then the client replies to the server with a PUBACK, and finally, the client receives the server’s reply PUBACK.
Payload
Not all packet types need to include a Payload.
The following table – Control Packets that contain a Payload lists the control packets that require a payload.
Control Packet | Contains Payload |
---|---|
CONNECT | Required |
CONNACK | Not required |
PUBLISH | Optional |
PUBACK | Not required |
PUBREC | Not required |
PUBREL | Not required |
PUBCOMP | Not required |
SUBSCRIBE | Required |
SUBACK | Required |
UNSUBSCRIBE | Required |
UNSUBACK | Not required |
PINGREQ | Not required |
PINGRESP | Not required |
DISCONNECT | Not required |
Based on the above table, we can see that the Payload message body, as the third part of the MQTT packet, is included in the CONNECT
, SUBSCRIBE
, SUBACK
, UNSUBSCRIBE
, and PUBLISH
packet types: 1) CONNECT, the content of the message body mainly includes: Client’s ClientID, subscribed Topic, Message, and Username and Password. 2) SUBSCRIBE, the content of the message body is a series of topics to subscribe to and their QoS. 3) SUBACK, the content of the message body is the server’s confirmation and reply to the topics and QoS requested by SUBSCRIBE. 4) UNSUBSCRIBE, the content of the message body is the topics to unsubscribe. 5) PUBLISH, the content of the message body is the actual message content (although optional, PUBLISH
packets are indeed commonly used).
Except for the packet types listed above, other packet types do not have Payload.
Next, we will analyze based on different packet types.