A Simple Note on Python Reverse Engineering

Introduction

Over the past decade, a significant amount of malware has been written in interpreted programming languages (such as Python), with increasing numbers and effective evasion techniques. In daily work, there are processes that require analyzing malicious files based on frontline feedback to obtain relevant information. When manually reverse engineering, one inevitably encounters Python malicious files. Therefore, I utilized some publicly available analysis tools to implement the analysis process for specific cases encountered, and I have recorded the process simply as notes.

Most of the Python malicious files we discovered are packaged using Py2exe or Pyinstaller. Thus, the first step in manual analysis is to use pyinstxtractor.py to extract the .pyc files. It is best to install the version of Python that was used to build the malicious exe file in the analysis environment; for this example, it is Python 3.8. After extraction, we can obtain .pyc files, .pyd files, and some .dll files. We mainly focus on the .pyc files and gradually find the core code file, which is the most important file here: Alfre.pyc.

https://github.com/extremecoders-re/pyinstxtractor

A Simple Note on Python Reverse Engineering

The .pyc file is a binary file compiled from a Python script, containing Python bytecode, which is not suitable for direct human reading. To obtain source-level content for security personnel to analyze and judge, we need to decompile the .pyc file into a .py file. This process can consider using the pycdc tool, which is available on GitHub, and needs to be compiled from source.

Overall, the compilation process of this tool is not complicated and provides a relatively smooth compilation experience.

https://github.com/zrax/pycdc

Compiling pycdc in Windows Environment

Compiling in Windows can be successful, but during the actual process, the step of decompiling the .pyc file into a .py file failed. The failure process is also recorded in this article for reference.

In the Windows environment, first prepare the CMake tool by directly downloading the MSI installation package.

https://cmake.org/download/

When installing, remember to check “Add CMake to the system PATH for all users” so that you do not have to configure the environment variable yourself.

A Simple Note on Python Reverse Engineering

Install and configure MinGW, the download link is as follows:

https://sourceforge.net/projects/mingw-w64/files/Toolchains%20targetting%20Win64/Personal%20Builds/mingw-builds/8.1.0/threads-win32/seh/x86_64-8.1.0-release-win32-seh-rt_v6-rev0.7z/download

Add the path of the MinGW bin directory to the environment variable, the path is: C:\mingw64\bin.

Enter the bin folder, find mingw32-make.exe, copy it, and rename one copy to make.exe (still kept in the bin folder).

A Simple Note on Python Reverse Engineering

Verify whether the configuration is successful by entering the following in cmd.

gcc -v

make -v

A Simple Note on Python Reverse Engineering

Use the command line for the CMake build steps as follows:

First, manually create the build folder: mkdir build

Then enter the build folder: cd build

Execute the command cmake -G "MinGW Makefiles" ..

Finally, execute the command make

A Simple Note on Python Reverse Engineering

A Simple Note on Python Reverse Engineering

The final result is that the Windows environment is not suitable for decompilation, and we need to switch to a Linux environment.

A Simple Note on Python Reverse Engineering

Compiling pycdc in Ubuntu Environment

My personal testing environment is Ubuntu 22.04, and this article compiles pycdc in the Ubuntu environment.

In the Ubuntu environment, execute the following commands:

git clone https://github.com/zrax/pycdc

cd pycdc

cmake .

make

./pycdc -h

If the compilation reports an error as follows, how to resolve this error? You need to install the C++ compiler.

onion@onionsec:~/Desktop/pycdc$ cmake .

— The C compiler identification is GNU 11.4.0

— The CXX compiler identification is unknown

— Detecting C compiler ABI info

— Detecting C compiler ABI info – done

— Check for working C compiler: /usr/bin/cc – skipped

— Detecting C compile features

— Detecting C compile features – done

CMake Error at CMakeLists.txt:2 (project):

No CMAKE_CXX_COMPILER could be found.

Tell CMake where to find the compiler by setting either the environment

variable “CXX” or the CMake cache entry CMAKE_CXX_COMPILER to the full path

to the compiler, or to the compiler name if it is in the PATH.

— Configuring incomplete, errors occurred!

See also “/home/onion/Desktop/pycdc/CMakeFiles/CMakeOutput.log”.

See also “/home/onion/Desktop/pycdc/CMakeFiles/CMakeError.log”.

On Debian-based systems (like Ubuntu), you can install g++ using the following commands:

sudo apt-get update

sudo apt-get install g++

After completing the installation of the C++ compiler, executing the above commands again will successfully complete the compilation, and the compiled binary file will be located in the current directory.

A Simple Note on Python Reverse Engineering

Using pycdc to actually decompile a sample shows that there were no errors. For security analysts, reading Python code can clearly indicate that this is likely a malicious file.

A Simple Note on Python Reverse Engineering

However, for the Python malicious files collected from the hw red team in 2023, there was an error due to unsupported opcodes, indicating that this tool is not 100% complete. However, it is sufficient for daily manual analysis.

A Simple Note on Python Reverse Engineering

Another case is that the Python bytecode extracted by quasarrat reported an error during decompilation, also indicating unsupported opcodes. However, for understanding whether the code has anomalies (malicious), it does not have a significant impact. After reading the decompiled output code, security analysts may immediately notice anomalies.

A Simple Note on Python Reverse Engineering

Manual analysis is the first step, directly facing the problem. However, from a defensive perspective, one needs to ask oneself the final question: how to batch detect Python trojans? This has become a real issue encountered.

Reference Article

https://corgi.rip/blog/pyinstaller-reverse-engineering/

A Simple Note on Python Reverse EngineeringA Simple Note on Python Reverse EngineeringA Simple Note on Python Reverse EngineeringA Simple Note on Python Reverse EngineeringA Simple Note on Python Reverse Engineering

More Daily Development Tips

All in We Heard Code Telegram Channel !

A Simple Note on Python Reverse EngineeringA Simple Note on Python Reverse Engineering

END

A Simple Note on Python Reverse Engineering

We Heard Code·Knowledge Planet is now open!

One-on-one Q&A for crawler-related issues

Career consultation

Interview experience sharing

Weekly live sharing

……

We Heard Code·Knowledge Planet looks forward to meeting you~

A Simple Note on Python Reverse Engineering

Employees of first and second-tier companies

Programming veterans with over ten years of experience

Students from domestic and foreign universities

Newcomers just starting in primary and secondary schools

Waiting for you in the “We Heard Code Technical Exchange Group”!

Joining method: Add WeChat “mekingname”, note “fan group” (advertisers are not welcome, serious inquiries only!)

Leave a Comment