Learn MQTT: A Comprehensive Guide

[Introduction]

What is MQTT?

MQTT (Message Queuing Telemetry Transport) is a lightweight communication protocol based on the Publish-Subscribe model, utilizing a Client-Broker communication model over TCP, and is categorized as an application layer protocol.

Originally developed by IBM in 1999, it has become one of the most popular communication protocols in the field of Internet of Things (IoT).

The Publish-Subscribe mechanism of MQTT easily meets our needs for one-to-one, one-to-many, and many-to-one communication.

What is the Publish-Subscribe Model?

In MQTT, the Publish-Subscribe model refers to the communication method between the message Publisher and the Subscriber.

The difference between the Publish-Subscribe model and the Client-Server model is that there is no need for a direct connection between the Publisher and Subscriber; instead, they communicate through the MQTT Broker, which is responsible for routing and distributing messages.

The following diagram illustrates the MQTT Publish-Subscribe process. A temperature sensor connects as a client to the MQTT Broker and publishes temperature data to a specific topic (e.g., Temperature). After receiving the message, the MQTT Broker is responsible for forwarding it to the subscriber clients subscribed to the corresponding topic (Temperature).

What is the Client-Broker Model?

The Client-Broker model refers to the communication mode between MQTT clients (including Publishers and Subscribers) and the MQTT Broker. Specifically:

Clients can be Publishers, Subscribers, or have both roles simultaneously.
Broker acts like a server, responsible for receiving messages published by Publishers to specific topics and forwarding them to the corresponding Subscribers.

What is a Topic?

A Topic is a classification or subject of messages, identified by a unique string that signifies the content or type of the message. The MQTT protocol forwards messages based on the topic.

Topics can have a hierarchical structure, separated by slashes (/), similar to a URL path. For example, weather/temperature is a hierarchical topic representing temperature information.

MQTT topics support two types of wildcards: + and #.

+: Represents a single-level wildcard, for example, a/+ matches a/x or a/y.
#: Represents a multi-level wildcard, for example, a/# matches a/x, a/b/c/d.

Note: Wildcard topics can only be used for subscriptions and not for publishing.

A topic can have multiple Subscribers, and the Broker will forward messages under that topic to all Subscribers; a topic can also have multiple Publishers, and the Broker will forward messages in the order they arrive.

For example, Client A publishes a message XXX to a specific topic (Topic), while Client B and Client C are subscribed to this topic. When Client A publishes a message to that topic, both Client B and Client C will receive the message. In this way, Client A acts as the Publisher, while Clients B and C act as Subscribers, achieving message transmission and reception by subscribing to the same topic.

Topics starting with $SYS/ are system topics, primarily used to obtain the operational status, message statistics, client online/offline events, and other data from the MQTT server. Currently, the MQTT protocol does not explicitly define a $SYS/ topic standard, but most MQTT servers adhere to the recommended standard.

For example, the EMQX server supports obtaining cluster status through the following topics.

For more content, please read the original article.

What is QoS?

MQTT defines Quality of Service (QoS) levels to control message reliability across different network environments. There are three QoS levels:

0: The lowest QoS level, indicating “At most once” delivery. After a message is published, the Publisher does not receive any acknowledgment, nor is there a guarantee that the message will be accurately delivered to the Subscriber. Messages may be lost or duplicated. Messages are sent at most once. If the current client is unavailable, the message will be lost.
1: The intermediate QoS level, indicating “At least once” delivery. After a message is published, the Publisher receives an acknowledgment (PUBACK), but there is no guarantee that the message will be delivered only once. Messages may be duplicated but will not be lost.
2: The highest QoS level, indicating “Exactly once” delivery. After a message is published, a complete acknowledgment of message delivery is conducted, ensuring that the message is delivered only once to the Subscriber. This level of QoS guarantees message integrity and accuracy but requires more network bandwidth and resources.

Advantages and Disadvantages of MQTT Protocol

Advantages	Disadvantages
Lightweight: Simple protocol with low overhead, suitable for use on resource-constrained devices.	Unreliable message delivery: Messages at QoS level 0 may be lost, unsuitable for scenarios requiring strict message delivery reliability.
Easy to implement: Provides client libraries in various programming languages, making application development convenient for developers.	Not suitable for large-scale message transmission: In scenarios with a large number of messages, it may lead to performance issues for the Broker.
Asynchronous communication: Supports asynchronous communication between Publishers and Subscribers, enhancing system flexibility and response speed.	Security limitations: By default, MQTT does not provide message encryption and authentication, requiring additional security measures to protect communication security.

Usage Scenarios

Device communication and data transmission in Internet of Things (IoT) applications, such as mobile-controlled vehicles, shared charging treasures, shared bicycles, device monitoring, etc.
Smart home systems, for example, controlling lighting, temperature, security systems, etc.
Data collection and monitoring in sensor networks, such as agricultural monitoring, environmental detection, medical monitoring, etc. For example, remote monitoring of patients’ vital signs data, operational status of medical devices, etc., and timely alerts and notifications to medical staff.

Mobile-Controlled Vehicles

In mobile-controlled vehicle scenarios, users send control commands (e.g., start, shut down, lock, unlock) to the vehicle through a mobile application. Taking mobile unlocking of the car as an example, the data flow of the unlock command typically includes the following steps:

User action triggers vehicle control command: After logging into the mobile app, the user clicks the unlock button on the interface.
The mobile application sends the command: The mobile app initiates a vehicle unlock request to the TSP server via HTTPS. The request may contain information such as command type (e.g., start the car), vehicle identifier (e.g., vehicle VIN), etc.
The backend server receives the command: After receiving the unlock request sent by the user, the TSP server authenticates the user.
The server sends the command to the vehicle: After successful authentication, the TSP server acts as a Publisher, sending a message via the MQTT protocol to a specific topic (e.g., vin/xxxx/unlock) (which may include the vehicle’s unique identifier (e.g., vehicle VIN), command type, command parameters, etc.).
The vehicle receives the command and executes it: The vehicle, as a Subscriber, subscribes to the topic (e.g., vin/xxxx/unlock) to receive commands. When the vehicle receives the MQTT message sent by the server, it executes the corresponding operation based on the command type and parameters, such as unlocking the vehicle.

Vehicle Information Monitoring

In vehicle information monitoring scenarios, the vehicle acts as a Publisher, periodically publishing the status information of the vehicle (e.g., speed, location, fuel consumption, vehicle health status, etc.). The monitoring system acts as an MQTT Subscriber, subscribing to the topic where the vehicle status information resides, and obtaining and displaying the vehicle status information in real-time.

[Data Packet Structure]

The MQTT data packet structure is very simple, consisting of a Fixed Header, Variable Header, and Message Payload.

The Fixed Header consists of the Control Packet Type, Flags, and Remaining Length. The Control Packet Type occupies the high four bits of the first byte, the Flags occupy the low four bits of the first byte, and the Remaining Length can be up to 4 bytes, with a maximum length of 5 bytes for the Fixed Header.
The length of the Variable Header depends on the type of packet and the fields included. Different types of packets contain different fields and lengths.
The length of the Message Payload depends on the actual message content. It can be empty or range from several bytes to several thousand bytes, depending on the application scenario and the amount of data to be transmitted.

The following is the basic structure of an MQTT data packet:

[Using MQTT]

Before using MQTT, you need to deploy an MQTT broker service, then use client tools to test connectivity, and then choose your preferred programming language to implement the desired functionality.

Broker Server Selection

Here are some common MQTT brokers:

MQTT Broker	Download Link	Advantages	Disadvantages
Mosquitto	https://github.com/eclipse/mosquitto	Lightweight, easy to deploy; open-source, free; supports multiple operating systems.	Relatively few features; limited scalability.
HiveMQ	https://www.hivemq.com	High performance, suitable for large-scale IoT applications; provides enterprise-level support.	Commercial license, requires payment; deployment and configuration are relatively complex.
EMQX	https://github.com/emqx/emqx	Supports MQTT, MQTT-SN, CoAP, and HTTP protocols; high availability and scalability.	Community support is relatively limited; deployment and configuration are relatively complex.
RabbitMQ	https://github.com/rabbitmq/rabbitmq-server	Rich in features, supports multiple communication protocols; reliability and high availability.	Steep learning curve; written in Java, resource-intensive.
ActiveMQ	https://github.com/apache/activemq	Open-source, rich in features and plugin support; reliability and availability.	Steep learning curve; written in Java, resource-intensive.
VerneMQ	https://github.com/vernemq/vernemq	Open-source, high availability and scalability; focused on IoT and real-time communication.	Community support is relatively limited; deployment and configuration are relatively complex.
Apache ActiveMQ Artemis	https://github.com/apache/activemq-artemis	Open-source, low latency, and high throughput; scalability and reliability.	Deployment and configuration are relatively complex; steep learning curve.

For more details about MQTT Brokers, please read the original article.

Client Tools

Typically, after deploying the MQTT broker, client tools are needed for connection testing and message testing, with MQTTX being commonly used. It includes desktop applications, command-line interfaces, and online web clients.

EMQX Usage Tutorial

Broker Server Deployment

EMQX, as an open-source MQTT message broker, has many advantages that make it popular in IoT and real-time communication applications. You can install it using installation packages like rpm or deb, or use Docker to install it, or use EMQX Cloud to avoid the hassle of self-deployment. The official Docker installation command is as follows:

docker run -d --name emqx -p 1883:1883 -p 8083:8083 -p 8084:8084 -p 8883:8883 -p 18083:18083 emqx/emqx:latest

Simple Usage

Both Publishers and Subscribers need to create an MQTT connection to connect to the broker. Create the connection for the Publisher using default settings:

Then create the connection for the Subscriber using default settings. Then click to add subscriptions to subscribe to messages from specific topics:

When the Publisher publishes a message to the topic test, the MQTT broker forwards the message to the corresponding Subscribers:

Programming Usage

The code is too long, please read the original article.

[How to Use MQTT Securely?]

As a lightweight message transport protocol, the security of MQTT depends on the implemented security measures and the design of the protocol itself. Here are some potential vulnerabilities in MQTT:

Unencrypted Communication: If the MQTT connection is not encrypted using TLS/SSL, the communication content may be intercepted by eavesdroppers, posing a risk of information leakage.
Unauthorized Access: Failure to authenticate the MQTT connection or using weak passwords for authentication may allow unauthorized users to connect to the MQTT broker, potentially performing unauthorized publish or subscribe operations.
Denial of Service Attacks: Malicious clients may send a large number of invalid connection requests or messages to exhaust server resources, resulting in denial of service attacks.
Topic Guessing: Attackers may infer the structure and content of sensitive information by monitoring topic subscription patterns, enabling information gathering or other attacks.
Man-in-the-Middle Attacks: If the MQTT connection is unencrypted, attackers can intercept communication traffic and perform man-in-the-middle attacks, altering or fabricating communication content.
Buffer Overflow: Messages that are not properly validated and handled may lead to buffer overflow vulnerabilities, potentially exploited by attackers for remote code execution or denial of service attacks.

Here are some key steps to ensure the security of MQTT usage:

Use TLS/SSL Encryption: Use TLS/SSL to encrypt MQTT connections, ensuring that data is encrypted during transmission. This can prevent eavesdroppers from stealing or tampering with data.
Authentication: Use username and password, client certificates, OAuth 2.0, tokens, etc., for authentication, allowing only authorized users to connect to the MQTT broker.
Access Control Lists (ACL): Use ACLs to restrict client access to topics. Through ACLs, it is possible to control which clients can access which topics, thereby preventing unauthorized access.
Persistent Sessions: Enabling persistent sessions ensures that previously subscribed topics and unpublished messages are not lost when clients reconnect. This helps ensure reliable data transmission.
Network Isolation: Place the MQTT broker in a secure network environment and take measures to restrict access to it. This can reduce the potential attack surface.
Regular Updates and Monitoring: Regularly update the MQTT broker and related software, and monitor its operational status and network activity. Timely apply security patches and updates to prevent known vulnerabilities from being exploited.
Security Audits and Logging: Conduct regular security audits to check whether system configurations and security measures comply with best practices, and log important security events and activities for investigation and analysis.

Configuring Authentication

Taking EMQX as an example, click through: Access Control → Client Authentication → Create, and you can choose between username and password, JWT, and SCRAM authentication methods:

Username and Password Based

You can choose from various data sources, including built-in databases, external databases, LDAP, and HTTP. For example, using the built-in database, configure the account type, password encryption method, and salting method:

Click Create to complete the setup.

Then click User Management → Add to configure the username and password:

At this point, the correct username and password are required to connect; otherwise, the connection will fail:

Based on JWT

JWT does not require a data source; parameters can be configured directly, supporting JWT and JWKS authentication:

The encryption method can be chosen as hmac-based or public-key, and the payload can be custom-added. After successful creation, the client must use the same secret to generate JWT signature data for authentication. During testing, it can be generated at jwt.io; if EMQX has enabled Secret uses Base64 encoding, then select secret base64 encode:

Then fill in the signature data in the Password field to connect successfully:

For more JWT authentication configurations in EMQX, please refer to: https://www.emqx.io/docs/zh/latest/access-control/authn/jwt.html

Based on SCRAM

SCRAM is a unique enhanced authentication method for MQTT 5.0. The encryption methods supported are sha256 and sha512:

After successful creation, click User Management → Add to configure the username and password:

Learn MQTT: A Comprehensive Guide

However, currently, MQTTX does not support SCRAM authentication, and EMQX has not provided detailed usage methods and example codes, which need to be implemented by writing code.

Additionally, X.509 certificate authentication and PSK authentication are also supported. For more information on EMQX authentication, please refer to: https://www.emqx.io/docs/zh/latest/access-control/authn/authn.html

Configuring Authorization

Taking EMQX as an example, authorization refers to controlling permissions for publish and subscribe operations of MQTT clients. The basic principle of EMQX’s authorization mechanism is that when a client attempts to publish or subscribe, EMQX retrieves the permission data for that client from the data source based on a specific process or user-defined query statement, and matches the permissions with the operation to allow or deny the operation.

Click through: Access Control → Client Authorization → Create, and you can choose between file (ACL file), built-in database, external database, and HTTP services:

ACL File

File-based authorization permission lists are simple and lightweight, suitable for configuring general rules. For hundreds of rules or more client-oriented rules, it is recommended to use other authorization sources.

Authorization rules are stored in the file in the form of Erlang tuples data lists.

The basic syntax and concepts are as follows:

Tuples are lists enclosed in curly braces, with elements separated by commas.
Each rule should end with a ..
Comment lines start with %% and will be discarded during parsing.

The ACL file syntax format is: {Permission, Client, Operation, Topic}

The first element indicates the permission corresponding to the rule; optional values:

allow (allow)
deny (deny)

The second element specifies the Client to which this rule applies, for example:

{username, "dashboard"}: Client with username dashboard; it can also be written as {user, "dashboard"}.
{username, {re, "^dash"}}: Client with username matching the regular expression ^dash.
{clientid, "dashboard"}: Client with client ID dashboard; it can also be written as {client, "dashboard"}.
{clientid, {re, "^dash"}}: Client with client ID matching the regular expression ^dash.
{ipaddr, "127.0.0.1"}: Client with source address 127.0.0.1; supports CIDR address format. Note: If EMQX is deployed behind a load balancer, it is recommended to enable proxy_protocol configuration for EMQX’s listener; otherwise, EMQX may use the load balancer’s source address.
{ipaddrs, ["127.0.0.1", ..., ]}: Clients from multiple source addresses, separated by ,.
all: Matches all clients.
{'and', [Spec1, Spec2, ...]}: Clients satisfying all specifications in the list.
{'or', [Spec1, Spec2, ...]}: Clients satisfying any specification in the list.

The third element specifies the operation corresponding to this rule:

publish: Publish messages.
subscribe: Subscribe to topics.
all: Publish messages and subscribe to topics.
Starting from version 5.1.1, EMQX supports checking QoS and retain message flags in publish and subscribe operations; you can specify checks for QoS or retain message flags in the third element, for example:

{publish, [{qos, 1}, {retain, false}]}: Deny publishing retain messages with QoS of 1.
{publish, {retain, true}}: Deny publishing retain messages.
{subscribe, {qos, 2}}: Deny subscribing to topics with QoS of 2.

The fourth element is used to specify the MQTT topic applicable to the current rule, supporting wildcards (topic filters) and using topic placeholders:

"t/${clientid}": Using a topic placeholder, when the client ID is emqx_c, the check will match the topic t/emqx_c.
"$SYS/#": Matches all topics starting with $SYS/ through wildcards, such as $SYS/foo, $SYS/foo/bar.
{eq, "foo/#"}: Exactly matches the topic foo/#; the topic foo/bar will not match; here eq indicates equality comparison.

Additionally, there are two special rules that are typically used at the end of the ACL file as default rules:

{allow, all}: Allow all requests.
{deny, all}: Deny all requests.

After adding the file authorization method, click Settings to edit the authorization file:

Built-in Database

The built-in database authorization does not require parameter configuration; it can be created directly. Click Permission Management to configure authorization. You can choose to configure authorization based on Client ID, Username, or All Users, such as prohibiting user test from publishing and subscribing to $SYS/# topic:

At this point, user test will be unable to subscribe to or publish to $SYS/# topic:

If only allowing user admin to subscribe and publish specific topics, configure to allow admin user to subscribe and publish specific topics, and configure all users to deny specific topics:

For more content on EMQX authorization, please refer to: https://www.emqx.io/docs/zh/latest/access-control/authz/authz.html

Connection Jitter

EMQX supports automatically banning clients that frequently connect within a short period, refusing these clients’ connections for a period to avoid excessive resource consumption by such clients on the server.

The connection jitter feature only bans the Client ID and does not ban usernames and IP addresses, meaning that as long as the machine changes the Client ID, it can continue to connect. However, the Blacklist feature can be used to ban based on Client ID, IP address (segment), username, and expressions.

Click through: Access Control → Connection Jitter → Enable Jitter Detection → Save Changes to enable:

At this point, multiple connections can trigger connection jitter:

In summary, the blacklist feature is somewhat rigid, while the connection jitter feature may seem somewhat unnecessary; if the username and password are set too simply, there is still a chance of being brute-forced by attackers, so it is essential to set a password of sufficient complexity. For more content, please read the original article.