The Trinity of Data Streams: OPC-UA, MQTT, and Apache Kafka

The Trinity of Data Streams: OPC-UA, MQTT, and Apache Kafka
Translator | Julian Chen
Reviewer | Shujuan Sun
This article will introduce how to handle data streams in the Internet of Things (IoT) and Industry 4.0 using Apache Kafka, MQTT, and OPC-UA, as well as explore two use cases from BMW and Audi.
In the world of IoT, MQTT (Message Queuing Telemetry Transport) and OPC-UA (OPC Unified Architecture) have become the open protocols and platform-independent standards for data exchange in Industrial IoT (IIoT) and Industry 4.0 use cases. Apache Kafka’s data streams are used for real-time integration and processing of massive amounts of data in data centers. This article will explore the relationship between Kafka and various IoT protocols, when to use which technology, and why sometimes HTTP/REST is a better choice. Finally, we will discuss two use cases from BMW and Audi.
The Trinity of Data Streams: OPC-UA, MQTT, and Apache Kafka
PART 01

Industry 4.0

Data stream platforms enhance overall factory efficiency

by connecting devices

Industry 4.0 and Industrial IoT (IIoT) require near real-time transmission, processing, analysis, and provision of data through systems. This leads to a daily increase in data volume and presents manufacturers with the challenge of data diversification. However, complicating matters further is the continued existence of traditional IT environments across various manufacturing facilities. This often limits manufacturers’ ability to achieve effective data integration across businesses. Therefore, most manufacturers need to implement a mixed strategy of data replication and synchronization. Currently, they are also striving to improve the Overall Equipment Efficiency (OEE) of their production facilities from product design and manufacturing to operational maintenance.
Meanwhile, the outbreak of COVID-19 and the disruption of the Suez Canal in 2021 have highlighted the need for immediate production and supply chain issues. Therefore, companies need to ensure the adaptability of production lines through real-time processes and monitoring in the following areas:
  • Just-In-Time (JIT) forecasting
  • Building factory capacity
  • Staffing and shift conditions
  • Fluctuations in raw material and product prices
Typically, data generated by devices must be converted immediately after generation and made available across the enterprise to realize the maximum value of data extraction, enhance overall factory efficiency, and avoid severe failures. Today, automotive manufacturers like BMW and Tesla have recognized the potential of data stream platforms and are utilizing the Apache Kafka ecosystem to circulate data. It can be said that the benefits of data streams for data-driven manufacturing companies are not only reflected in digitalization and automation transformation but also include:
  • Making production processes more efficient
  • Cost-effective and fast
  • Minimizing error rates
PART 02

When to Use Kafka, MQTT, and OPC-UA

As mentioned earlier, Kafka is an excellent data stream platform that can be used for large-scale real-time messaging, storage, data integration, and processing. However, Kafka is not a panacea, and it fails to achieve the following:
  • Agents at millions of customer levels (e.g., mobile applications)
  • API management platforms
  • Databases for complex queries and batch analysis
  • IoT platforms with device management and other functions
  • A technology for hard real-time applications
Given the above reasons, Kafka needs to be complemented by the collaborative use of MQTT and OPC-UA. This allows for the processing and exchange of large amounts of data in near real-time across factories, companies, and globally.

The Trinity of Data Streams: OPC-UA, MQTT, and Apache Kafka

Global Apache Kafka and Event Stream Use Cases
As shown in the figure above, Apache Kafka typically acts as a node in a distributed data stream data grid, capable of integrating various systems, including edge, IoT devices, and business software, and can operate independently of the underlying infrastructure (e.g., edge, on-premises, public, multi-cloud, and hybrid cloud). Therefore, an open, scalable, and flexible architecture is crucial for integration with legacy environments and leveraging modern cloud-native applications.
Event-driven data stream platforms like Apache Kafka meet such needs perfectly. They collect telemetry data from relevant sensors and data from IT systems, processing it during data transmission. This is the concept of “dynamic data,” as opposed to “static data,” where events are stored in a database for later viewing. In IoT use cases, processing static data is often considered an “outdated architecture.”
PART 03

Using Domain-Driven Design and True Decoupling for Separation

In fact, it doesn’t matter what kind of infrastructure the factory’s IT environment is built on. What matters is that new and old systems can achieve real-time data integration and maintain continuous data flow and message storage in a decoupled manner. Compared to other messaging systems (e.g., RabbitMQ in IT or MQTT in IoT), Apache Kafka’s domain-driven design (DDD) achieves true decoupling of backpressure handling and data replayability, while enhancing high availability and fail-safe, which are critical in production environments.
The Trinity of Data Streams: OPC-UA, MQTT, and Apache Kafka
Kafka Domain-Driven Design for Industrial IoT MQTT and OPC UA
PART 04

OPC-UA, MQTT, HTTP, and Others

Currently, there are three common standards for open and standardized IoT architecture: OPC-UA (Open Platform Communications Unified Architecture), MQTT (Message Queuing Telemetry Transport), which are IoT-specific protocols, and a simple REST/HTTP. Of course, there are also some proprietary specific protocols in the industry, such as Skynet’s proprietary DataHub Transport Protocol (DHTP, https://skkynet.com/iiot-protocol-comparison) and open standard alternatives like AMQP. The table below shows their feature comparisons:
The Trinity of Data Streams: OPC-UA, MQTT, and Apache Kafka
Comparison of Industrial IoT Protocols
PART 05

Decision Tree for Evaluating IoT Protocols

So which one is more worth choosing? First, it is important to emphasize that such discussions are only meaningful if you have a choice. If you purchase and install a new machine or PLC in the workshop that only provides specific interfaces, you have no choice but to use it.
Of course, people in different fields may have different preferences for choices. Some may even consider proprietary solutions to be better from the perspective of TCO and ROI.
Overall, for different IoT protocols, my advice is to use open standards as much as possible, and even combine them as needed.
Now, let’s look at a simple decision tree for choosing between OPC-UA, MQTT, HTTP, and other proprietary industrial IoT protocols:

The Trinity of Data Streams: OPC-UA, MQTT, and Apache Kafka

Decision Tree for Industrial IoT: MQTT, OPC UA, and HTTP REST
Let’s discuss the decision tree above:
  • HTTP/REST is very suitable for simple use cases (keep it as simple as possible). The understandable and easy-to-use nature of HTTP applies to almost any scenario. It does not require additional tools, APIs, or middleware, and its communication is synchronous “request-response.” If you can use HTTP (S) ports 80 or 443 instead of other TCP ports, you can also get the cooperation of the security team smoothly. Of course, HTTP belongs to one-way communication. For example, connected cars need an HTTP server to obtain data pushed from the cloud, and they will use pub/sub.
  • MQTT is very suitable for scenarios with tens of thousands or hundreds of thousands of devices, using limited bandwidth and intermittent networks (e.g., connected car infrastructure). Its communication uses an MQTT broker as an intermediary for asynchronous publish and subscribe. MQTT does not use a standard data format, but developers can use Sparkplug to build additional components. MQTT is very lightweight, and its Quality of Service (QoS) and last will features solve various requirements of IoT use cases out of the box. At the same time, MQTT is very suitable for two-way communication (e.g., connected car<–>cloud communication) and other IT use cases. Additionally, LoRaWAN and other low-power wide-area networks are very suitable for MQTT.
  • OPC-UA is very suitable for industrial automation (e.g., machines on the production line). Today’s communication is usually client/server but also supports publish/subscribe. It uses standard data formats and provides rich, powerful, complex functionalities, components, and industry-specific data format sets. OPC-UA is very suitable for scenarios where OT and IT are integrated. The TSN (Time-Sensitive Networking) of OPC UA is an optional component. It is an Ethernet communication standard that can provide open, deterministic, and hard real-time communication.
  • Proprietary protocols are suitable for specific problems that standardized implementations cannot solve. These protocols often have their pros and cons. They often bring powerful high performance but are also expensive and have limitations.
As mentioned earlier, when choosing between OPC-UA, MQTT, and other protocols, it is not an either/or situation. In many industrial cases, we can use both OPC-UA and MQTT simultaneously in modern applications, leveraging the strengths of each protocol to integrate legacy applications and proprietary SCADA systems or other historical data with proprietary middleware.
PART 06

Integration of MQTT, OPC-UA, and Kafka

In the integration process of MQTT, OPC-UA, and Kafka, we usually involve the following devices and components:
  • Kafka Connect connectors: can achieve native Kafka integration at the protocol level. Confluent Hub (https://www.confluent.io/hub/) can serve as an alternative. Some companies often build their custom Kafka Connect connectors.
  • Custom integration: integration through low-level MQTT/OPC-UA APIs (e.g., using Kafka’s HTTP/REST proxy) or Kafka clients (e.g., .NET/C++ for Windows environments).
  • Open third-party IoT middleware: general open-source integration middleware (e.g., Apache Camel with IoT connectors), IoT-specific frameworks (e.g., Apache PLC4X or Eclipse Ditto), or proprietary third-party IoT middleware based on standard APIs.
  • Commercial IoT platforms: suitable for existing legacy deployments and can act as a “glue” for code and protocols (e.g., Modbus, Siemens S7, etc.). Traditional historical data, proprietary protocols, monolithic architectures, limited scalability, and ETL batch processing platforms are very suitable for adopting commercial IoT platforms to bridge the gap between on-premises and cloud services.
PART 07

Using OPC-UA or MQTT to Connect Machines and Devices

Although OPC UA and MQTT are not designed for data processing and integration, their advantages lie in establishing real-time two-way “last mile” communication between devices, machines, PLCs, IoT gateways, or vehicles.
As mentioned above, these two standards have different focuses and can be used in combination. Currently, almost all modern machines, PLCs, and IoT gateways in smart factories can support OPC-UA. MQTT is mainly used in poor networks and large-scale device scenarios.
Data streams often flow into data stream platforms through connectors. Such platforms can be deployed in parallel with the “edge” IoT platforms or combined into hybrid cloud scenarios. As flexible data centers, data stream platforms can integrate and process data between OT and IT applications. In addition to OPC-UA and MQTT on the OT side, IT applications such as MES, ERP, CRM, data warehouses, and data lakes are connected in real-time, whether at the edge, on-premises, or in the cloud.

The Trinity of Data Streams: OPC-UA, MQTT, and Apache Kafka

Apache Kafka as an Open and Scalable Historical Database with MQTT and OPC UA
(https://www.kai-waehner.de/blog/2020/04/21/apache-kafka-as-data-historian-an-iiot-industry-4-0-real-time-data-lake/)
PART 08

Developing and Predicting Simulations of Digital Twins

By continuously streaming data and processing and integrating sensor data, data stream platforms can create an open, scalable, and highly available infrastructure for deploying digital twins.
Digital twins combine IoT, artificial intelligence, machine learning, and other technologies, aiming to create virtual simulations of physical components, devices, and processes. By referencing historical data, digital twins can also self-update immediately when changes occur in the data generated by their physical counterparts.
Typically, Kafka can be combined with other technologies to build digital twins. For example, Eclipse Ditto is a project that combines Kafka with IoT protocols. Some teams may also use Kafka and databases like MongoDB to customize digital twins.

The Trinity of Data Streams: OPC-UA, MQTT, and Apache Kafka

Apache Kafka Powers Digital Twins in Industry 4.0 and Industrial IoT
As shown in the figure above, in Industry 4.0, machine operators can gain detailed insights into the lifecycle of the elements they simulate or monitor through digital twins, continuously optimizing products and processes, testing the functionality and performance of individual components or entire systems, and predicting energy consumption and wear.
PART 09

Condition Monitoring and Predictive Maintenance

In modern maintenance, machine operators often need to know in a timely manner: Are all devices functioning as expected? How long can these devices typically run before maintenance is required? What are the causes of anomalies and errors?
On one hand, they need to rely on a reliable and scalable infrastructure to support data stream processing, analysis, and integration, thereby detecting critical indicators such as severe temperature fluctuations or vibrations in real-time to take action and ensure factory productivity. On the other hand, digital twins can identify the causes of failures by correlating the current data captured by sensors with historical data, facilitating predictive maintenance measures. Most importantly, by ensuring that equipment and facilities are only maintained when necessary, they can implement predictive maintenance plans more effectively, saving valuable resources for manufacturing companies and avoiding costly downtime.

The Trinity of Data Streams: OPC-UA, MQTT, and Apache Kafka

Using ksqlDB and TensorFlow with Apache Kafka for Condition Monitoring and Predictive Maintenance
(https://www.kai-waehner.de/blog/2021/10/25/apache-kafka-condition-monitoring-predictive-maintenance-industrial-iot-digital-twin/)
PART 10

Connected Cars and Streaming Machine Learning

Connected cars are vehicles that can communicate bidirectionally with other systems outside the vehicle. They enable sharing of access and data over the internet between various devices and applications inside and outside the car. The previously mentioned combination of MQTT and Kafka can serve the use cases for connected cars and their infrastructure.
The following diagram shows how Kafka integrates with tens of thousands, or even hundreds of thousands, of IoT devices, processing data in real-time. In the infrastructure of connected cars, this process can automate predictive protection (i.e., anomaly detection) and predict engine failures:

The Trinity of Data Streams: OPC-UA, MQTT, and Apache Kafka

Kappa Architecture, Kafka MQTT Kubernetes, and TensorFlow for Streaming Machine Learning
PART 11

BMW Case Study

Implementing Manufacturing 4.0 with Smart Factories and Cloud Services

Let’s explore from a technical perspective how BMW successfully uses Kafka and OPC-UA as a real-time data center between edge devices and cloud applications. Before implementation, BMW aimed to achieve the following goals:
  • Acquire IoT data without impacting other services and transmit it to the correct location
  • Collect once, process and consume multiple times (different consumers at different times using different communication paradigms, such as real-time, batch processing, request-response)
  • Achieve scalable real-time processing and shorten time to market
The BMW team successfully connected its global smart factory loads to Azure’s Confluent Cloud using the OPC-UA connector, enabling real-time replication in the public cloud. Here, Kafka, as a messaging platform, provides true decoupling, transparency, and digital innovation between different interfaces. Additionally, Confluent enhances the stability of manufacturing systems through its products and expertise. Furthermore, the real-time optimized supply chain management solution also provides accurate inventory information both physically and within the ERP system.
PART 12

Audi Case Study

Connected Cars with Swarm Intelligence

Audi built the infrastructure for connected cars using Apache Kafka. They shared and discussed their use cases and architecture at the Kafka Summit.
The Trinity of Data Streams: OPC-UA, MQTT, and Apache Kafka
They built a complete process for real-time data analysis, swarm intelligence, partner collaboration, and predictive AI, achieving real-time processing and storage of all sensor data from connected cars for historical analysis and real-time reporting.
PART 13

Aside

If the connectivity of the network infrastructure allows you to use “serverless Kafka” in your IoT projects, then you can leverage Confluent Cloud to promote smart factories globally, as in the BMW case, using “serverless Kafka” to process and integrate data streams. With serverless data streams, you can focus more on IoT business applications and improve Overall Equipment Efficiency (OEE).
Original link:
https://dzone.com/articles/opc-ua-mqtt-and-apache-kafka-the-trinity-of-data-s
Translator’s Introduction
Julian Chen, editor of 51CTO Community, has over ten years of experience in IT project implementation, skilled in managing internal and external resources and risks, focusing on disseminating knowledge and experience in network and information security; continuously shares cutting-edge technologies and new knowledge through blogs, themes, and translations; often conducts information security training and lectures online and offline.

The Trinity of Data Streams: OPC-UA, MQTT, and Apache Kafka

Leave a Comment