Research on VxWorks Firmware System

This article analyzes the real VxWorks firmware. First, it introduces the VxWorks system, followed by an analysis of the device firmware extraction to loading addresses and firmware formats, combined with the VxWorks source code for symbol recovery of some commonly used functions.

01
Overview of VxWorks

VxWorks is a real-time operating system (RTOS) designed by Wind River, USA, in 1983, commonly used in various embedded and IoT devices, as well as in industrial control, communications, military, aerospace, satellite communication, and other high-tech applications. Unlike general Linux, VxWorks implements its own set of process communication, task scheduling, memory management, and interrupt mechanisms. Typically, the VxWorks file system is compiled together with the kernel system into a complete firmware.

Regarding the security research of the VxWorks system, the main difficulties lie in the following aspects:

1. VxWorks is closed source, and compiling it requires Wind River’s commercial software, making it impossible to obtain source code and compilation parameters, with limited resources requiring significant effort for reverse analysis;

2. Wind River uses its own compiler, which is different from the GNU-provided gcc distribution, leading to different compilation logic and assembly characteristics compared to typical Linux gcc;

3. Accurately identifying and recovering names of unsigned VxWorks firmware functions is quite challenging;

4. Different manufacturers and devices may have customized firmware, resulting in significant differences between various firmware;

5. VxWorks system simulation and debugging are difficult.

Due to these research challenges, there are very few research results both domestically and internationally.

02
Case Analysis

Analyzing the VxWorks system from an actual router device, including firmware extraction and the research records of the firmware analysis process.

2.1 Extracting Firmware

Existing methods for extracting device firmware generally include the following:

1. Obtaining a shell directly through the UART debug interface for debugging;

2. Using a hot air gun to remove the chip and read it offline through a programmer;

3. Connecting wires to read the firmware using a programmer;

Next, I will describe some problems encountered and conclusions drawn while attempting these methods.

1) Obtaining a shell through the UART debug interface

After removing the device shell and locating the UART interface on the PCB, I communicated via USB-TTL serial, connecting RX, TX, GND. There are many tutorials online for this step, so I won’t elaborate further. As I didn’t know the baud rate, I tried multiple rates and finally set it to 57600 without garbled characters, obtaining the following router startup information:

Research on VxWorks Firmware System

From the information above, it can be seen that U-boot offers four options at startup, but there is no debugging-related option, and during the actual connection, it is impossible to send an interrupt signal to stop and make a selection, rendering this method ineffective. Some guesses regarding this situation are:

1. The system does not provide a debugging interface;

2. There may be a debugging method known only to developers.

After ruling out the option of obtaining a debugging interface, other methods can be used to analyze the firmware:

1. Re-flashing U-Boot, which allows access to a debugging shell but carries significant risks and may easily damage the device;

2. Extracting device firmware for analysis, which is less risky but requires extensive reverse engineering work.

2) Reading by removing the chip

Upon examining the entire PCB, there is only one flash chip, model QH16B-104H1P. Due to certain device constraints, and wanting to keep the router device as intact as possible, I did not opt to remove the chip; I only consider this method as a last resort.

Research on VxWorks Firmware System

3) Reading firmware by connecting wires to a programmer

After ruling out the above two methods, I chose to read the firmware by connecting wires to a programmer, which carries relatively low risk. I used the chip’s markings to query the chip information using the Half-Bridge APP:

Research on VxWorks Firmware System

Research on VxWorks Firmware System

This is an 8 Pin NOR Flash memory, which can execute code directly from the storage without needing to read it into RAM. Additionally, there is a NAND Flash, which is usually paired with these two types of flash. With pin information in hand, I began to consider how to read the firmware via wires, with two options:

1. Reading via a programmer;

2. Reading via a Raspberry Pi;

I first attempted the first method, but my primary machine is a Mac, which lacks USB 2.0 ports. This posed a problem as the programmer does not support USB 3.0, making it impossible to read flash data. However, USB 2.0 adapters are not easy to find, so I switched to a Windows desktop to read.

Research on VxWorks Firmware System

In a powered-off state, the firmware read via the programmer does not contain any VxWorks information, analyzed by binwalk as follows:

Research on VxWorks Firmware System

Generally, there should be a file in the firmware system that is larger than others, which is the kernel file, but none is found here. At this point, some guesses arise: Is it impossible to read the system kernel file in a powered-off state? Does U-Boot need to be started to read the kernel file?

Regarding these speculations, I conducted a test under powered conditions.

It is important to note that since the programming clip has 8 pins, connecting it to the programmer may inadvertently touch VCC power, which could cause a short circuit and lead to unpredictable consequences. To be safe, I decided not to read with the programmer while powered on, opting instead to use the Raspberry Pi to read the firmware, without connecting VCC. Of course, this powered reading method carries some risks and could potentially damage the Raspberry Pi.

The Raspberry Pi has many pins, which need to be connected to the chip according to the following markings:

Research on VxWorks Firmware System

For the 8 Pin chip, connecting the following pins is sufficient, excluding VCC:

Research on VxWorks Firmware System

Connecting via Ethernet to the Raspberry Pi, configuring the host and Raspberry Pi to be on the same subnet, I SSH into the Raspberry Pi, determine the SPI interface, and use the flashrom tool to extract the firmware. The extraction process is as follows:

Research on VxWorks Firmware System

This time, the extracted firmware contains the VxWorks kernel file:

Research on VxWorks Firmware System

Extracting the first large file and using binwalk for analysis reveals VxWorks system information as follows:

Research on VxWorks Firmware System

At this point, the firmware system has been successfully extracted from the device, and the next step is to analyze the firmware.

2.2 Preliminary Analysis

With the firmware in hand, I first attempted to use `binwalk -e` to directly extract the firmware contents. This step extracts a large number of files, as binwalk does not perform well with RTOS or encrypted firmware systems, resulting in a lot of cluttered information. The following image illustrates an example:

Research on VxWorks Firmware System

From the text, some HTML tag characteristics can be seen, tentatively assuming it is an HTML file. However, the first line of extracted content contains some invisible characters, indicating that binwalk’s extraction is not accurate enough, and the firmware may be encrypted. The entropy of the firmware viewed by binwalk is shown in the following image:

Research on VxWorks Firmware System

The entropy value indicates that the firmware is likely not encrypted at the beginning, meaning it can be analyzed at the U-Boot location. This provides a starting point for analyzing the firmware system through the U-Boot loading process. The middle position is temporarily indeterminate as to whether it has been encrypted or compressed, requiring further analysis.

In the binwalk analysis results, the largest content is generally the kernel file:

Research on VxWorks Firmware System

This section can be manually stripped, and then manually decompressed using lzma:

Research on VxWorks Firmware System

After decompression, using binwalk for analysis clearly shows it to be a VxWorks system, but unfortunately, there is no related symbol table information, indicating it may be a desymbolized firmware:

Research on VxWorks Firmware System

Binwalk has parsed the firmware as VxWorks 5.5.1 MIPS Little Endian. This VxWorks file is placed into IDA for analysis, and since the firmware loading address is currently unknown, there is no need to input the loading address as the main goal is not to reverse it yet; we need to first determine whether the firmware is encrypted, which could lead to binwalk’s inaccurate parsing.

During the preliminary analysis of the firmware, one of the easiest entry points is to search for strings and then find the calling functions based on string references. Fortunately, we found the following strings:

Research on VxWorks Firmware System

Clearly, these are some request-related strings, indicating that the VxWorks file is not encrypted but simply wrapped in lzma compression. Here, I attempted to search for string references unsuccessfully because I had not yet analyzed the loading address. Additionally, I tried searching for the names of VxWorks initialization functions like bzero, usrInit, and bfill but to no avail, further confirming that this is a firmware without any symbols.

Based on the above conclusion, since all files are lzma compressed, we can analyze the lzma file format, manually extract a certain lzma file, and then manually decompress to see if there are any garbled characters.

It is important to know some key knowledge about the lzma compression format. The first 13 bytes of an lzma file header include the magic number, folder size, and uncompressed size, occupying byte, dword, and qword respectively. Some content in the firmware is as follows:

Research on VxWorks Firmware System

The first, second, and third parts make up the header, while the fourth part contains the compressed content, ending with four “00” characters. The first part properties are derived from lc, lp, and pb through bitwise operations yielding 0x5A. The lzmainfo command can be used to view their specific values. The second part is the folder size (dictionary size). The third part is the uncompressed size. Following this pattern, we found the starting and ending addresses of the VxWorks compression in the firmware using a hex editor, which were 0x15200 and 0x11c240, with a total size of 0x107040, matching the size of what we manually stripped from binwalk.

Research on VxWorks Firmware System

The ending position shows some deviation; clearly, there is a MINIFS string above, with 0xFF padding above it, and another lzma header information closely following below. Thus, the ending should not be at the four “0” characters but rather before the 0xFF padding, which can be manually adjusted during parsing.

Research on VxWorks Firmware System

Now, we attempt to extract the lzma file starting from 0x11C240, ending at 0x122de4, and then manually decompress it to obtain the following content:

Research on VxWorks Firmware System

At the beginning, there are still invisible characters, indicating that whether we manually extract or binwalk parses, the results are the same.

In fact, during the manual parsing of lzma, an interesting finding was made: in the previous lzma content, the last few bytes store the size of the next lzma part, simply put, these lzma files are connected together. As shown:

Research on VxWorks Firmware System

In the image above, the third part is the lzma header identifier, indicating where the lzma file begins. 0x5A is calculated from the lc, lp, and pb three flags. The fourth part is the little-endian uncompressed size, and the first part is the big-endian uncompressed size. The first and fourth parts have the same value. The second part indicates the end of the previous file, immediately followed by the content of the lzma compressed file.

During the analysis, several MINIFS strings were found by searching globally in the firmware, with four occurrences of MINIFS, all preceded by 0xFF padding:

Research on VxWorks Firmware System

Research on VxWorks Firmware System

This leads to a bold guess that MINIFS serves as a management folder for multiple lzma files, connecting these lzma files together, and when these files are needed, they can be decompressed for use. This indicates that there is a high probability that all files stored in this firmware are in a compressed state using lzma. This method is quite understandable, primarily to save storage space on embedded devices, but it inevitably sacrifices some speed.

2.3 MINIFS File Format

Here, I found that the MINIFS file format differs from the information I gathered online. According to online sources, the MINIFS string starts, followed by 0x18 bytes for the file path table, with each table occupying 88 bytes. Each table begins with a dword offset value, followed by the plaintext file path until the string ends. Below is an example provided by the documentation:

Research on VxWorks Firmware System

However, mine looks like this:

Research on VxWorks Firmware System

Research on VxWorks Firmware System

What this part contains is unknown, but it is speculated that the VxWorks system has encrypted it, requiring reverse engineering on VxWorks.

2.4 Loading Address Analysis

To analyze the VxWorks system, it is first necessary to know the firmware loading address. Although RTOS systems place the file system into the entire kernel, making reverse analysis inconvenient, this also brings an issue: such RTOS systems cannot utilize protection mechanisms like PIE in Linux, meaning the firmware loading address is fixed. Once the loading address is determined, all functions can be parsed.

Existing methods for analyzing loading addresses include:

1. Using symbol functions like bzero, usrInit, bfill to obtain the sp register assignment, which indicates the firmware loading address;

2. Using string reference addresses to guess the firmware loading address;

3. Analyzing the firmware system to obtain the VxWorks loading address.

Since there is no function symbol information, the first method can be ruled out.

The second method is also quite troublesome, so I will directly use the third method. This method involves guessing the suspected uimage header data segment, searching globally in the firmware for the string “MyFirmware” and locating the segment’s starting position above it. Two addresses point to 0x80001000, which is also the same loading address as many similar router devices. In the firmware file, at 0x15000, adding 0x18 bytes, we can try using this address as the firmware loading address for analysis:

Research on VxWorks Firmware System

After setting the loading base address in Ghidra, it automatically analyzes over 8000 functions. However, IDA only identifies over 4200 functions, with a considerable amount of content not automatically recognized. In IDA, I manually create or script create functions, ultimately identifying over 8000 functions, but these functions lack names.

Since there are many garbled characters in the files extracted by binwalk, I analyze the garbled characters:

Research on VxWorks Firmware System

The image above shows part of the file content, and the following conclusions can be drawn:

* The red part may be 32-bit encrypted file names or path information, which requires further analysis;

* The green part represents the current file size, indicating how many bytes this file occupies;

* The pink part indicates the file’s starting address, with the value plus the file size equaling the end position of the file.

Another situation is shown in the following image:

Research on VxWorks Firmware System

In the image, many 0d 0a can be seen, which are the Windows line breaks, typically used together. Additionally, ef bb bf is the Unicode BOM signature, which can be found via a simple search. The remaining content has the same meaning as in the first image.

Next, I will conduct reverse analysis on VxWorks functions to determine whether VxWorks has encrypted MINIFS, while also identifying some basic functions.

2.5 VxWorks Reverse Analysis

Using the information previously read from the UART serial:

Research on VxWorks Firmware System

I then searched for string references and obtained the following results:

Research on VxWorks Firmware System

The printf in the image was originally without symbols, which I manually named based on some printf characteristics. From the image, it can be seen that the parameter a2 is a file path string, indicating that the sub_80014d28 function is likely the module loading function, which contains a file decryption function. The logic is quite complex, but it can be confirmed that it calls the lzma decompression function:

Research on VxWorks Firmware System

So far, the decryption process has not been fully reverse analyzed, but I have restored the symbols for functions like strcpy using the leaked VxWorks 5.5.1 source code. The definition of the strcpy function in the leaked source code is as follows:

Research on VxWorks Firmware System

From the source code, this is a very simple implementation of strcpy, where the macro definition of EOS is not found in the leaked source code, and other macro definitions are also not located, indicating a lack of some header files. However, a simple search for EOS on Google or Baidu reveals useful information, and it turns out that the definition of EOS is simply a “0” character, functioning similarly to NULL.

Next, we will analyze the strcpy function in the VxWorks file, with the IDA decompiled result displayed. The MIPS assembly is as follows:

Research on VxWorks Firmware System

The pseudo code is as follows:

Research on VxWorks Firmware System

By comparing with the source code, we can rename the functions in IDA accordingly. This method can also be used to find the strncpy function, as shown in the image below:

Research on VxWorks Firmware System

strncpy shares some characteristics with strcpy, meaning that the assignment portion of the code is the same, as shown in the red box in the image below:

Research on VxWorks Firmware System

The pseudo code is also similar to the source code:

Research on VxWorks Firmware System

03
Symbol Recovery Methods

After analyzing the above information, we can start vulnerability mining. However, if we want to thoroughly study the VxWorks system, we still need to recover symbols. The main methods for recovering symbols in unsigned VxWorks systems include:

1. Open-source tool lscan uses sig files provided by Hex Rays to identify and recover symbols from statically compiled library functions. Hex Rays uses FLIRT technology, which relies on a database of assembly code generated by different compilers. However, it currently cannot automatically generate sig signatures and cannot identify functions not included in the sig. Although Hex Rays provides many sig files, none contain signatures for Wind River’s functions, requiring implementation;

2. Rizzo script plugin, which uses heuristic function identification to recognize more functions than FLIRT, but requires a signature file with symbols. The key point is finding a signature file with symbols, which is difficult;

3. IDA’s flair plugin;

4. The public finger function recognition plugin from Alibaba, which is a signature library generated by Alibaba and can be used alongside lscan;

5. Recovering function names using bindiff between signed and unsigned VxWorks files.

In summary, when the library functions used by the program are known, related libraries can be compiled first, and then FLIRT, Rizzo, and bindiff can be used for identification. When the library functions used by the program are unknown, lscan can be used to identify potential libraries, and then loading and identification can proceed. This is also one of the more labor-intensive steps in VxWorks research. So far, we are also researching automated symbol recovery methods.

04
Other Technical Issues to Address

1. The extraction of VxWorks file systems is not precise enough, requiring recovery of file names and research on the integrity of extracted file contents;

2. For automated kernel function name recovery, accurate function name recovery needs to be achieved, which is also one of the research challenges;

3. Research on debugging methods for VxWorks needs to implement firmware system simulation and debugging functions.

05
Conclusion

This article organizes the research records of VxWorks based on a real firmware device, including analysis and identification of system loading addresses, firmware splitting and decompression, function name recovery, and other research topics for discussion. Since the author has reached some conclusions through self-exploration, some content may lack precise references, and extensive experiments on numerous firmware samples have not been conducted. Therefore, the accuracy of the conclusions cannot be guaranteed. This article serves only as a discussion exchange, providing some ideas; any errors should be corrected.

Reference
https://github.com/kv59piaoxue/VxWork551
https://hex-rays.com/products/ida/tech/flirt/in_depth/
https://github.com/aliyunav/Finger/blob/master/finger_plugin.py
https://github.com/naim94a/lumen
https://iot-security.wiki/hardware-security/firmware/extraction.html
https://www.mrskye.cn/archives/501bc7a2/#树莓派引脚定义
https://www.cnblogs.com/iriczhao/p/12128451.html
https://www.secpulse.com/archives/75635.html
https://bbs.pediy.com/thread-230095.htm
https://github.com/cleanwrt/u-boot_mt7620

Leave a Comment

×