Vulnerability Analysis of Industrial Distributed Control Systems

As one of the types of industrial control systems, Distributed Control Systems (DCS) are widely used to control critical infrastructure, distributed in the oil, chemical, metallurgy, cement, and water systems, serving as the “brain” for process operators. Their importance is self-evident.

In the mid-1970s, decentralized control systems based on microprocessors emerged. After more than 30 years of development, DCS has integrated computer, communication, display, and control technologies, but it also faces increasing security risks.

Let’s review the exciting content of “Vulnerability Analysis of Industrial Distributed Control Systems” presented at the 2019 Security Developers Conference.

Editor’s Note

Crownless: With the industrial production entering the era of information and digitalization, its security issues have received increasing attention. If the system is compromised, it may not only force production to stop but could also lead to explosions and other accidents.Industrial control security is technically not very difficult, but it is very fragile. This requires us to invest more resources into industrial control security.

Guest Introduction

Jian Sitong, Master of Software Engineering from Fudan University, Senior Researcher of Industrial Control Security at Shadow Security Team, Permanent Director of the China Automation Association, Kcon 2018 Lecturer, Developer of Ethernet/IP Protocol Device Sniffing Tools.

During the conference, the lecturer introduced the architecture of industrial distributed systems, industrial network topology, and vulnerability analysis, opening up a new perspective in risk control.

Specific Speech Content

The following is the full transcript:

Hello everyone! I currently work on the industrial automation vendor side, responsible for industrial control penetration and defense work, and also undertake industrial control research for Shadow Security, Dawn Security, and Mister Security.

The topic I will share with you is “Vulnerability Analysis of Industrial Distributed Systems”.

In industrial control systems, there are many types, besides PLC, a major type is DCS Industrial Distributed Control Systems. What are the differences in architecture and application between these two, and what vulnerabilities exist in this field leading to system instability? Let me briefly share with you.

A user uses a DCS system, where the user uses an unauthorized USB brought by the vendor, which inadvertently infects its host computer with a variant of WannaCry, causing the host to crash. The user sought our team’s help to resolve the virus issue.

What happened as a result? After we completed the protection for the HMI, we helped check whether the industrial control network DCS system had corresponding vulnerabilities, which led to today’s topic.

The structure you see is the standard architecture of DCS, the gray part on the next side is the so-called controller, which in DCS is called the “Unit Controller.” What does it control? It controls valves and sensors in the chemical, oil, and water industries.

The original intention of automation design is to replace manual manufacturing, eliminating the need for human involvement in industrial production. It does this with different types of controllers. One level up, in the form of Ethernet, is its monitoring system.

You can see that the controller is a box-like device, connected to external sensors and actuators through signal lines. As the operators in the control room, how can they know the on-site status in real time? Through what kind of interactive interface? It will be through the upper layer, which we call the monitoring system.

One level up is the so-called management system, which extracts monitoring and management information and analyzes production conditions. The lower layer is purely monitoring, surveillance, and control, while the upper layer involves analysis. This is the standard DCS system.

What is the difference between it and PLC Control Systems? PLC mainly focuses on the gray-colored controllers below, while the monitoring software above and the controllers below can be from two different vendors or software, or even an application software developed on a PLC. However, in DCS, this situation never occurs; DCS is always from one vendor, and the protocol used is also proprietary.

We primarily test the area marked in red, which is the safety instrumented system. For example, let’s assume a scenario, as many of you may not be in manufacturing, it’s hard to visualize this scene.

For example, when you drive, how is gasoline produced? The three major oil companies extract underground oil, which is then refined and cracked to become the oil we use. This process involves high temperatures and high-speed centrifugation, both of which are very dangerous.

What does the red system mean?? For instance, if it sets a temperature too high, say 1200 degrees, what if the liquid in the tank is less than 10%? What if the centrifuge spins too fast, reaching tens of thousands of revolutions? To ensure safety, this controller is set up so that once the established indicators are exceeded, it stops immediately to ensure the safety of equipment and personnel.

This is the essential difference between traditional information security and industrial security: traditional information security protects knowledge and information assets, but industrial control security is not like this. While there are many information assets, the more important aspects are personnel and equipment.

We once encountered an incident; everyone knows about boilers. In northern winter heating, heaters are provided by heating companies, which have a dedicated boiler that generates steam to send over. If the water level in the boiler is too low, continuing to heat could lead to an explosion; this controller ensures that the water does not boil dry.

It’s like boiling water at home. In the boiler system, once the water boils dry, the boiler can explode because the steam inside cannot be released. This controller ensures the safety of the equipment and personnel.

This scenario consists of several parts: there are 2 Cisco 2960 layer 2 switches forming its network structure, along with two DCS controllers, two servers running Server 2003, and four clients running XPSP3. The reason I marked this side gray is that we need to apply patches and security protections; the client requested that we test without including these two devices. Additionally, we used one Kali machine.

This diagram shows the network architecture, with the two middle devices being 2960 switches, the two below are redundant controllers, the two above are servers running 2003, and there are four clients. Each machine has two lines, one yellow and one green, ensuring that the single network does not experience interruptions, maintaining dual networks.

However, to ensure network redundancy, it not only guarantees dual networks but also ensures that if a switch fails, it does not affect usage. This creates an STP environment.

Upon closer inspection, you will find that there is a loop here. How to ensure there is no loop is a problem that the DCS vendor must solve in communication, ensuring that the link is redundant while also preventing loops. It’s like a community with many routes; how to ensure it doesn’t spin around is solved by the vendor.

Each computer behind has interfaces like this, with one network card having two ports. The vendor has made it into a bridging mode; traditionally, this should be two different subnets, but it has been made into the same subnet, creating a loop in this mode.

How to break this loop and prevent it from causing interference? The vendor developed a two-layer fault-tolerant Ethernet protocol, which was developed over ten years ago and is an excellent redundant Ethernet protocol.

The left image shows that the two-port network card’s protocol layer has added its own driver software to make it a bridge. In this bridging mode, it also shields the STP detection that could form loops, which is a very clever aspect. At the same time, it enables routing functionality.

Further up is the protocol layer, which runs its own industrial protocols, and what protocols run here can be seen later. This diagram shows a highly reliable network based on layer 2, and the other image shows whether the status of each node is normal. There are four links available to any node, and if any one link fails, it does not affect the system, but it only uses one link.

Looking at the diagram below, although there are four links, ultimately two will be blocked. Which one is the best is calculated by the system itself.

When our team received the penetration task, the first thing we looked at was the 2960 switch, checking if we could find information related to 2960 and the vendor. We found the vendor’s configuration file, and the red-marked area means that it prevents forming a loop in the DCS network and enables the MSTP protocol, ensuring only single-chain communication while allowing multi-path redundancy.

Based on this, what are the features of STP? It is a network designed to break loops. STP has an unused state, a forwarding state, a blocking state to prevent loops, a learning state, and a listening state. These five states cycle through, and the state of the switch port cycles through these states.

Each state transition takes 5-15 seconds, during which no data is forwarded, meaning data is interrupted. Another diagram shows that when a triangle or loop is formed, it blocks the port. Although it looks like a loop, it is actually line-shaped.

What tools do we use for this destructive experiment? We tried to see what would happen if we attacked this STP. We used a tool specifically designed to attack industrial Ethernet layer 2 protocols, which is available in Kali versions prior to 2007.

We chose to continuously send BPDU; what does this mean? In the three switches seen earlier, when forming a loop network, there is always a manager to decide which port is closed and which port is open. How is this manager selected? The device with the lowest address is considered the manager.

We simulated, if we send a large number of MAC addresses to this switch, will there be a link oscillation? This video shows our experiment on the DCS system’s host computer, where you can see the attack status. It inputs the username and password, and then we select the online monitoring diagram, which shows the client’s online production status. I switched to this system to hack the MAC address of that 2960 switch, targeting the switch’s MAC, not the port’s MAC, because it is layer 2; the port does not have a MAC.

We first sent a BPDU to create an oscillation, indicating that it would cause oscillation. At this point, it began to learn, and then we continuously sent to keep it oscillating, sending MAC addresses wildly. At this point, we switched to the platform, and while the screen was still displaying, the data was no longer refreshing. Why was the screen still there? It had a reconnection time with its controller, which has a defined time within three attempts. Now it seems that everything is still functioning.

At this point, the data had stopped refreshing. When we tried to return to the interface, we found it had reported an error, “timeout,” and the controller had disconnected completely. This was our demonstration, causing the DCS host system to disconnect from its controller.

The biggest difference with DCS is that once the host computer disconnects, the controllers below automatically enter a protective state and will not continue to operate, indirectly putting the controller in a hold state, where all ports maintain the last state without any changes.

Additionally, when we discovered this, we tried to see what protocol the controller was running. We looked into whether anyone knows about CVE-2018-0171, a vulnerability found by a security team in Cisco, meaning that if a malformed packet is sent to port 4786, it could cause the switch to crash or even gain root privileges.

The reason we used this CVE to target the switch was to create a port mirroring to extract the traffic from the host to the controller. Unlike hubs, switches do not forward packets to all ports, making it impossible to see the traffic between them.

This segment contains our attack code; the last one sent will cause the 2960 switch to crash if special characters are installed at tv1 and tv2. At that time, we aimed to gain root access, but unfortunately, we crashed the switch directly.

Moreover, the client only gave us one day, so we could not modify the code to find the error to gain root access. We took a more brute-force approach by MAC flooding; the MAC address table of the 2960 switch is about 4K. We tested that if it exceeded 4K, it would turn into a hub.

Because it cannot find the ports that need to be forwarded without the MAC address table, it will overwrite the original MAC addresses with false ones, and when the switch finds no corresponding MAC, it resorts to broadcasting. During broadcasting, we can capture clear traffic from the host to the controller.

It ran a Modbus-TCP, which is a very common industrial protocol in industrial Ethernet; however, it not only ran this Modbus-TCP but also ran a multicast protocol on a 224 multicast segment, running the communication service on multicast and the data on TCP, which is its design method. We also captured this traffic through MAC flooding.

The Modbus-TCP port is 502. Which companies use Modbus-TCP? Many domestic DCS systems on the market have this interface, and this protocol has inherent shortcomings.

But why do many people use Modbus-TCP? Because Modbus-TCP is the protocol for industrial Ethernet; it was developed before security was discussed in industrial environments, focusing only on functionality and business. It is the first generation of industrial Ethernet protocol, so it is more concerned with how to implement business without considering security.

This is its message; it is a 7-layer protocol, with the bottom having a Modbus-TCP station address. This station address is not the TCP address; it starts from 0, meaning there are as many addresses as there are devices in the station. There is also a function code that defines what this data stream is supposed to do.

Furthermore, there are addresses, and the data follows, but there is no checksum here; the checksum fields are empty because it uses the TCP checksum, which is not included in the protocol. The code has many types, such as 1, 2, 3, 4, 5, and 6, with a maximum of 16 codes for reading, writing, batch reading, and batch writing. However, there is no identity verification in this protocol, making replay attacks and unauthorized access valid.

We wrote this code targeting the protocol’s flaws; the target address is 10.1.1.35, port 502. We perform a hack on address “000” to trigger a “01” transition, which, in the eyes of information security professionals, is just a numerical change and seems insignificant.

However, in the industrial field, it is not like that. If this position corresponds to your 24v steam valve, which is closed, causing it to transition once, for example, 40 tons of steam will suddenly surge through the pipeline, meaning that anyone nearby could be killed by the invisible steam, which could appear as a white mist. In a previous incident, a programming error opened 40 tons of steam, resulting in the on-site engineer losing everything.

The entire code is like this, with the first few characters being standard fixed ones, the rest being the calculated protocol word length, and then which function code to use, with the last part being ff for 1 and 000 for 0. A delay is made in between, and after this transition, it achieves the change of the flag from 0 to 1. I presented this demonstration to the client.

This video shows the execution of this code, and you can see on one side a simulated on-site controller; focus on the first digit “0.” Once this program executes, it will cause a transition.

This code is very simple; we want to do a unique off for the address, where is it? It is at the address “1.” You can see that place; it changed to 1 and then back to 0. Just this action allows you to manipulate one bit or all bits, letting the calculator run states that are not what the original program intended. This is all possible.

We have discussed so much; the key PPT is this page. In the face of such issues, how do we protect against them?

First, switches must implement port security policies. Currently, industrial users’ switch ports are either open or in a default state, which is not acceptable. If not in use, these ports must be closed to ensure port security.

Second, increase isolation with industrial firewalls. Since the protocol itself is not secure, how to ensure protection against unauthorized attacks? By filtering firewalls to ensure that the original MAC and attack vectors are kicked out. Additionally, the main host in the network must be monitored.

Third, the user side originally had no firewall installed on the computer, nor anti-virus software. This time, we helped the user install anti-virus software, enabled the firewall, established baseline security, and set up whitelist protection to prevent other malicious software from being executed. Overall, we provided protection in this area.

Thank you very much, this is my personal information, with a QR code, email, and three teams: Shadow, Dawn, and Mister. If anyone is interested in public safety, feel free to contact me privately.

Industrial control security is not technically difficult, but it is very fragile. However, the country has elevated infrastructure security to such a height because it consists of personnel, equipment, and assets, which together form a security level. It is not simply about information loss; it could cause equipment damage and personnel casualties.

Note: Click the original link at the end of the article to view the complete presentation PPT for this topic. Other topic presentations will be released gradually after obtaining the lecturers’ consent, so please keep an eye on the Kanxue Forum and the Kanxue Academy WeChat public account!

Vulnerability Analysis of Industrial Distributed Control Systems

1. The 2019 Kanxue Security Developers Conference concluded successfully! Highlights recap on-site.

2. 2019 SDC Topic Review | New Threat Responses: TSCM Technology Anti-Eavesdropping

3. 2019 SDC Topic Review | Building EDR Security Capabilities on macOS from a Security Research Perspective

4. 2019 SDC Topic Review | Android Containers and Virtualization

5. 2019 SDC Topic Review | Cloud Data-Based Forensic Technology

6. 2019 SDC Topic Review | Design and Implementation of Android Vulnerability Detection Sandbox

7. 2019 SDC Topic Review | Who Pushed Open My “Window”: Analysis of iOS App Interface Security

Vulnerability Analysis of Industrial Distributed Control Systems

Official WeChat ID: ikanxue

Official Weibo: Kanxue Security

Business Cooperation: [email protected]

↙Click “Read the original text” to view the presentation PPT

Related posts

Leave a Comment Cancel reply