RCTF 2021 – Complete Reproduction and In-Depth Analysis of ASM Crypto Encryption Algorithm

Introduction

This article is a complete technical reproduction of the ASM Crypto problem from RCTF 2021. The problem provides an encryption program written in x86 assembly and the encrypted ciphertext, requiring reverse analysis to restore the plaintext.

This article will adopt the forward analysis + reverse solving + code verification approach, ensuring that each step has theoretical basis and practical verification, allowing beginners to fully understand the entire process.

Problem Materials

After extracting the attachment, we obtain:

asm/
├── crypto.asm    # MASM32 assembly source code (encryption program)
└── flag.enc      # Encrypted ciphertext (256 hexadecimal characters)

Quick observation of flag.enc:

Length is 256 characters
Contains only [0-9a-f] (lowercase hexadecimal)
Assumption: 128 bytes of binary data encoded in hexadecimal

Step 1: Static Analysis of Assembly Source Code

Main Program Flow

start:
    ; 1. Read input
    invoke  ReadConsole,hStdin,offset buffer,bufferSize+2,offset cLen,NULL
    sub     cLen,1          ; Only subtract 1 character (\n), keep \r

    ; 2. Four-step encryption process
    invoke  mInit,addr buffer,cLen
    invoke  Encode,addr buffer,bufferSize
    invoke  c2w,addr buffer,addr buffer2,bufferSize
    mov     tt,0
    invoke  dfs,addr buffer2,0

    ; 3. Output ciphertext
    invoke  WriteConsole,hStdout,offset buffer3,bufferSize*2+1,offset dwNum,NULL

Key Points:

sub cLen,1 only subtracts the newline character\n
In the Windows console, input usually ends with\r\n
Therefore, the plaintext will retain the\r character (this detail is very important!)

Encryption process: Input → mInit → Encode → c2w → dfs → Output

Function 1: mInit – Expansion and XOR

mInit proc s,l
    ; First stage: loop to fill to 128 bytes
    mov @temp,1        ; Increment value starts from 1
    .while ecx<bufferSize
        mov dl,[eax+ebx]       ; Read buffer[eax]
        add edx,@temp          ; Add increment value
        mov [ecx+ebx],dl       ; Write to buffer[ecx]
        inc ecx
        inc eax
        .if eax==l             ; Finished one round of original input
            xor eax,eax        ;   Reset read position
            add @temp,1        ;   Increment value +1
        .endif
    .endw

    ; Second stage: XOR all bytes with length
    mov eax,l
    .while ecx<bufferSize
        mov dl,[ebx+ecx]
        xor dl,al              ; buffer[ecx] ^= length
        mov [ebx+ecx],dl
        inc ecx
    .endw
    ret
mInit endp

Mathematical Description

Let the input plaintext be P[0..L-1], output S[0..127]:

Stage 1 – Loop Filling:

Init[i] = (P[i mod L] + floor(i / L)) mod 256

Explanation:

Value at position i = Original character P[i mod L] + Current round of filling floor(i / L)
This keeps the first L bytes unchanged, while the subsequent bytes are cyclically copied with an increment offset

Example (L=3, plaintext “ABC”):

Position i	i mod L	i // L	Calculation	Result
0-2	0-2	0	A,B,C	A,B,C
3-5	0-2	1	A+1,B+1,C+1	A+1,B+1,C+1
6-8	0-2	2	A+2,B+2,C+2	A+2,B+2,C+2
…	…	…	…	…

Stage 2 – XOR with Length:

S[i] = Init[i] XOR L

All 128 bytes are XORed with the input length.

Function 2: Encode – In-Place Rolling XOR

Encode proc s,l
    mov @k,8
    .while ecx<l
        xor edx,edx
        mov eax,ecx
        div @k              ; eax = ecx/8, edx = ecx%8

        .if edx!=0          ; If i % 8 != 0
            mov dl,[ebx+ecx-1]     ; Read left neighbor buffer[i-1]
        .endif
        .if eax!=0          ; If i / 8 != 0
            mov al,[ebx+ecx-8]     ; Read upper neighbor buffer[i-8]
        .endif

        mov ah,[ebx+ecx]    ; Read current value
        xor ah,al           ; XOR upper neighbor
        xor ah,dl           ; XOR left neighbor
        mov [ebx+ecx],ah    ; ⚠️ Write back in place!
        inc ecx
    .endw
    ret
Encode endp

Key Understanding: In-Place Update

This is the easiest place to make mistakes! The code is in-place modification, meaning:

E[i] = S[i] ⊕ (i≥8 ? E[i-8] : 0) ⊕ (i%8≠0 ? E[i-1] : 0)

Note:

E[i-1] and E[i-8] use the already updated values, not the original S[i-1]!
Processing order is from 0 to 127 (forward)
Subsequent bytes depend on previously processed bytes

Dependency Graph:

Position:  0  1  2  3  4  5  6  7  8    9   10  ...
Dependency: None 0  1  2  3  4  5  6  0    1+8  2+9 ...

This design creates a strong avalanche effect: a change in one byte will affect all subsequent bytes.

Function 3: c2w – Binary to Hexadecimal Conversion

func1 proc var1
    mov eax,var1
    mov ah,al
    and al,00001111b        ; Low 4 bits
    and ah,11110000b        ; High 4 bits
    shr ah,4
    lea ebx,ca              ; ca = '0123456789abcdef'
    movzx ecx,al
    mov al,[ebx+ecx]        ; Low 4 bits → character
    movzx ecx,ah
    mov ah,[ebx+ecx]        ; High 4 bits → character
    ret
func1 endp

c2w proc s1,s2,l
    .while dword ptr ecx<l
        mov dl,[ebx+ecx]
        invoke func1,edx
        mov edx,s2
        mov byte ptr [ecx*2+edx],ah
        mov byte ptr [ecx*2+edx+1],al
        inc ecx
    .endw
    ret
c2w endp

Function: Convert 128 bytes to 256 hexadecimal characters

Example:

Input byte:0x4A
Output characters:'4' (0x34) and 'a' (0x61)
Result:"4a"

Function 4: dfs – In-Order Traversal of Complete Binary Tree

dfs proc s,t
    mov ecx,t
    .if ecx>=bufferSize*2       ; If node>=256, return
        jmp dfs_jmp
    .endif

    mov eax,t
    add eax,eax
    inc eax                     ; eax = 2*t+1 (left child)
    invoke dfs,s,eax            ; Recursively visit left subtree

    mov dl,[ebx+ecx]            ; Visit current node
    mov ecx,tt
    lea ebx,buffer3
    mov [ebx+ecx],dl            ; Output to buffer3
    inc ecx
    mov tt,ecx

    inc eax                     ; eax = 2*t+2 (right child)
    invoke dfs,s,eax            ; Recursively visit right subtree
dfs_jmp:
    ret
dfs endp

Complete Binary Tree and In-Order Traversal

Consider the 256 characters as nodes of a complete binary tree:

              0
          /       \
        1           2
       / \         / \
      3   4       5   6
     /\   /\     /\   /\
    7  8 9 10  11 12 13 14
   ...

Node Relationships:

Parent node:(i-1) // 2
Left child:2*i + 1
Right child:2*i + 2

In-Order Traversal (left→root→right):

Access order: [255, 127, 63, 128, 31, 129, 64, 130, ...]

Effect: Completely disrupts the data order, with the data at position 0 being that of node 255!

Step 2: Mathematical Modeling (Forward Process)

Let the plaintext be P[0..L-1], the complete encryption process:

Step 1: mInit Expansion

Init[i] = (P[i mod L] + floor(i/L)) mod 256
S[i] = Init[i] XOR L

Step 2: Encode Dependency Encoding

E[i] = S[i] ⊕ (i≥8 ? E[i-8] : 0) ⊕ (i%8≠0 ? E[i-1] : 0)

Step 3: c2w Hexadecimal

H[2*i] = hex_high(E[i])
H[2*i+1] = hex_low(E[i])

Step 4: dfs Tree Traversal

C = inorder_traverse(H)  # Rearranged by in-order traversal

Step 3: Reverse Decryption Scheme

Overview of Reverse Steps

Decrypt: Ciphertext C → Reverse dfs → Reverse c2w → Reverse Encode → Reverse mInit → Plaintext P

Step A: Restore In-Order Traversal Permutation

Principle: Generate the index sequence of in-order traversal to establish a reverse mapping

# Generate in-order traversal sequence
order = []
def dfs(t):
    if t >= 256: return
    dfs(2*t + 1)      # Left subtree
    order.append(t)    # Record node
    dfs(2*t + 2)      # Right subtree
dfs(0)

# order[i] indicates: the node number accessed at the i-th visit
# Reverse: Ciphertext C[i] should be placed back at H[order[i]] position

H = [''] * 256
for i in range(256):
    H[order[i]] = C[i]

Correctness: In-order traversal is deterministic, and the reverse mapping exists uniquely.

Step B: Hexadecimal to Binary

E = bytes.fromhex(''.join(H))  # 256 characters → 128 bytes

Direct conversion, no ambiguity.

Step C: Reverse Encode – Key Insight!

Review the formula for Encode:

E[i] = S[i] ⊕ E[i-8] ⊕ E[i-1]

Due to the self-inverse property of XOR (A ⊕ B ⊕ B = A), we have:

S[i] = E[i] ⊕ E[i-8] ⊕ E[i-1]

Key Points:

E[i], E[i-1], E[i-8] are all known (in array E)
No need to know S[i-1] or S[i-8]
Can be calculated directly from front to back!

def decode_encode(E):
    S = bytearray(128)
    for i in range(128):  # Process from front to back
        v = E[i]
        if i >= 8:
            v ^= E[i-8]
        if i % 8 != 0:
            v ^= E[i-1]
        S[i] = v
    return bytes(S)

Why can it be processed forward? Because Encode is an “in-place update”, meaning that when updating E[i], the already updated values of <code>E[i-1] and E[i-8] are used. During the reverse process, we know the final E array and can directly use the same index relationship to backtrack S.

This is different from the ordinary “calculate first and then update”:

# If it were like this (non-in-place):
E_new[i] = S[i] ^ S[i-1] ^ S[i-8]  # Using old values of S

# Then the reverse would need to be processed from back to front:
for i in range(127, -1, -1):
    S[i] = E[i] ^ S[i-1] ^ S[i-8]

But the assembly code is:

mov ah,[ebx+ecx]    # Read current
xor ah,al           # XOR (using updated value)
mov [ebx+ecx],ah    # Write back to the same position

This is in-place operation, so the reverse can also use the same formula.

Step D: Reverse mInit – Brute Force Length

Restoring Init from S is simple:

Init[i] = S[i] XOR L

But we do not know L! We need to utilize the regularity of the filling.

Constraints:

Init[i] = P[i mod L] + floor(i / L)  (mod 256)

For the same original character P[j], it will present an arithmetic sequence at different positions:

Init[j]      = P[j] + 0
Init[j+L]    = P[j] + 1
Init[j+2L]   = P[j] + 2
...

Brute Force Method:

for L in range(1, 129):
    # Validate whether each position satisfies the arithmetic increment
    valid = True
    P = bytearray(L)

    for j in range(L):
        x0 = (S[j] ^ L) & 0xFF  # Init[j]
        P[j] = x0

        # Check subsequent positions
        m = 1
        while j + m*L < 128:
            expected = (x0 + m) & 0xFF
            actual = (S[j + m*L] ^ L) & 0xFF
            if expected != actual:
                valid = False
                break
            m += 1

        if not valid:
            break

    if valid:
        return L, bytes(P)

Uniqueness: Under the global constraint of 128 bytes, only the true L can satisfy the arithmetic relationship at all positions.

Step 4: Complete Decryption Script

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
RCTF 2021 ASM Crypto - Complete Decryption Script
"""

import binascii

# Read ciphertext
with open('flag.enc', 'r', encoding='ascii') as f:
    enc_hex = f.read().strip()

print(f"[*] Ciphertext length: {len(enc_hex)}")

# ========== Step A: Restore In-Order Traversal Permutation ==========
print("[*] Step 1: Restore In-Order Traversal...")

N = 256
order = []

def dfs(t):
    if t >= N:
        return
    dfs(2*t + 1)
    order.append(t)
    dfs(2*t + 2)

dfs(0)

# Reverse permutation
buf3 = list(enc_hex)     # Ciphertext sequence
buf2 = [''] * N          # Original order
for k, idx in enumerate(order):
    buf2[idx] = buf3[k]

hex_str = ''.join(buf2)
print(f"    Restored: {hex_str[:60]}...")

# ========== Step B: Hexadecimal to Binary ==========
print("[*] Step 2: Hexadecimal to Binary...")

E = bytearray(binascii.unhexlify(hex_str))
print(f"    Obtained {len(E)} bytes")

# ========== Step C: Reverse Encode ==========
print("[*] Step 3: Reverse Encode (Dependency Encoding)...")

S = bytearray(128)
for i in range(128):
    v = E[i]
    if i >= 8:
        v ^= E[i-8]
    if i % 8 != 0:
        v ^= E[i-1]
    S[i] = v

print(f"    Completed")

# ========== Step D: Reverse mInit (Brute Force Original Length) ==========
print("[*] Step 4: Brute Force Original Length...")

def recover_plain(M):
    for L in range(1, 129):
        P = bytearray(L)
        ok = True

        for j in range(L):
            # Get Init[j]
            x0 = (M[j] ^ L) & 0xFF
            P[j] = x0

            # Validate arithmetic increment
            m = 1
            while j + m*L < 128:
                expected = (x0 + m) & 0xFF
                actual = (M[j + m*L] ^ L) & 0xFF
                if expected != actual:
                    ok = False
                    break
                m += 1

            if not ok:
                break

        if ok:
            return L, bytes(P)

    return None, None

L, P = recover_plain(S)

if L is None:
    print("    [!] No valid length found")
else:
    print(f"    [+] Found length: L = {L}")
    print(f"\n{'='*70}")
    print(f"[FLAG] Original bytes: {P}")
    print(f"[FLAG] Visible text: {P.rstrip(b'\r').decode('latin1')}")
    print(f"{'='*70}")

# ========== Step E: Forward Verification ==========
print("\n[*] Step 5: Forward Encryption Verification...")

def mInit(P):
    l = len(P)
    buf = bytearray(128)
    buf[:l] = P
    i, t, eax = l, 1, 0
    while i < 128:
        buf[i] = (buf[eax] + t) & 0xFF
        i += 1
        eax += 1
        if eax == l:
            eax = 0
            t += 1
    al = l & 0xFF
    for i in range(128):
        buf[i] ^= al
    return bytes(buf)

def encode_forward(s):
    s = bytearray(s)
    for i in range(128):
        dl = s[i-1] if (i % 8) != 0 else 0
        al = s[i-8] if i >= 8 else 0
        s[i] = s[i] ^ al ^ dl
    return bytes(s)

def c2w(b):
    ca = b'0123456789abcdef'
    out = bytearray(256)
    for i, x in enumerate(b):
        out[2*i]   = ca[(x >> 4) & 0xF]
        out[2*i+1] = ca[x & 0xF]
    return bytes(out)

def inorder_permute(buf2):
    o = []
    def dfs(t):
        if t >= len(buf2):
            return
        dfs(2*t+1)
        o.append(t)
        dfs(2*t+2)
    dfs(0)
    return ''.join(buf2[i] for i in o)

# Re-encrypt
re_enc = inorder_permute(c2w(encode_forward(mInit(P))).decode())

if re_enc == enc_hex:
    print("    [✓] Verification successful! Encryption result matches exactly")
else:
    print("    [✗] Verification failed")
    print(f"    Expected: {enc_hex[:60]}...")
    print(f"    Actual: {re_enc[:60]}...")

print("\n[*] Decryption completed!")

Step 5: Running Results

Save the script as solve.py and place it in the same directory as flag.enc, then run:

$ python3 solve.py

Output:

[*] Ciphertext length: 256
[*] Step 1: Restore In-Order Traversal...
    Restored: 473c1e38740b4b0714640c4652333461546e7c7c544657452a030c22...
[*] Step 2: Hexadecimal to Binary...
    Obtained 128 bytes
[*] Step 3: Reverse Encode (Dependency Encoding)...
    Completed
[*] Step 4: Brute Force Original Length...
    [+] Found length: L = 19

======================================================================
[FLAG] Original bytes: b'Th15_lS_@_easy_ASM\r'
[FLAG] Visible text: Th15_lS_@_easy_ASM
======================================================================

[*] Step 5: Forward Encryption Verification...
    [✓] Verification successful! Encryption result matches exactly

[*] Decryption completed!

In-Depth Analysis of Key Technical Points

1. Why is there a `\r` at the end of the plaintext?

In the Windows console:

After user input and pressing enter, the system reads ...\r\n
The assembly code only executes sub cLen,1, subtracting \n
Therefore, \r is retained in the input as part of the plaintext

This is a detail, but it affects the final byte count and content.

2. How to understand the “in-place update” of Encode?

Key Difference:

Method A (Non-In-Place):

temp = S[i] ^ S[i-1] ^ S[i-8]  # Using original values of S
E[i] = temp                     # Write to new array

This situation requires reverse traversal for decryption.

Method B (In-Place, Assembly Implementation):

mov ah,[ebx+ecx]     # ah = buffer[i] (may have been modified)
xor ah,al            # Use buffer[i-8] (may have been modified)
xor ah,dl            # Use buffer[i-1] (has been modified)
mov [ebx+ecx],ah     # Write back to buffer[i]

This is a typical in-place algorithm, where subsequent values depend on previously updated values.

During the reverse: Due to the properties of XOR A ⊕ B ⊕ B = A, use the same formula:

S[i] = E[i] ^ E[i-8] ^ E[i-1]

Where E[i-8] and E[i-1] are known values in the ciphertext, allowing direct calculation.

3. Mathematical Basis for Length Brute Force

Filling Regularity:

Init[j]      = P[j] + 0
Init[j+L]    = P[j] + 1
Init[j+2L]   = P[j] + 2
...
Init[j+kL]   = P[j] + k  (mod 256)

For incorrect L':

Different j positions will have different “base values”
It is difficult for all 128 positions to form a coherent arithmetic sequence
Especially under modulo 256 operations, incorrect L' is almost impossible to satisfy global constraints

Experimental Verification:

# Try L=18, 19, 20
for L in [18, 19, 20]:
    valid_positions = 0
    for j in range(L):
        x0 = (S[j] ^ L) & 0xFF
        m, ok = 1, True
        while j + m*L < 128:
            if (S[j+m*L] ^ L) != ((x0 + m) & 0xFF):
                ok = False
                break
            m += 1
        if ok:
            valid_positions += 1
    print(f"L={L}: {valid_positions}/{L} positions satisfy constraints")

Only when L=19 do all positions satisfy.

4. Application of In-Order Traversal of Complete Binary Tree

Traversal Order Characteristics:

The leftmost leaf node (255) is accessed first
The root node (0) is accessed in the middle position
Completely disrupts the linear order

Role in Cryptography:

Data permutation
Increases analysis difficulty
Combined with linear transformations to form SP network structure

Common Errors and Debugging Tips

Error 1: Incorrect Reverse Direction of Encode

Symptoms: Decrypted data is completely garbled

Cause: Misunderstanding that it needs to be processed from back to front

Correct Approach: Understand the meaning of “in-place update”, process from front to back using the same formula

Error 2: Ignoring `\r` Character

Symptoms: Length never matches, finding 18 instead of 19

Cause: Did not consider the input characteristics of the Windows console

Solution: Check the assembly code’s sub cLen,1, which only subtracts one character

Error 3: Insufficient Verification of Length Brute Force

Symptoms: Multiple “possible” lengths found

Cause: Only checked partial constraint conditions

Correct Approach: Check the arithmetic relationship of all 128 bytes

Debugging Tips

1. Step-by-Step Verification:

# Print intermediate results at each step
print(f"Step 1: {data[:20]}")
print(f"Step 2: {data[:20]}")
...

2. Forward Comparison:

# Encrypt known plaintext and compare intermediate steps
 test_plain = b"test"
 step1 = mInit(test_plain)
 # Compare with the same step of the decryption process

3. Boundary Conditions:

# Check special positions
# i=0: Does not depend on previous
# i=7: Only depends on left neighbor
# i=8: Only depends on upper neighbor
# i=9: Depends on both left and upper neighbors

Security Analysis

Advantages

Multi-layer Transformation: Combines padding, XOR, encoding, and permutation operations
Dependency Relationships: Encode establishes dependencies between bytes, creating an avalanche effect
Non-linear Permutation: Tree traversal disrupts order

Disadvantages

Algorithm Disclosure: Once the source code is provided, it becomes a pure mathematical problem
No Key: All operations are deterministic, with no key involved
Fixed Length: 128-byte fixed length may leak information
Strong Reversibility: Each step is a simple reversible operation

Improvement Suggestions

If it is to be transformed into a practical encryption algorithm:

Introduce a Key:

Use the key to control the increment value of the padding
Derive XOR values from the key instead of using the length directly

Increase Rounds:

Multiple rounds of Encode, each using different dependency patterns
Similar to AES’s multi-round structure

Dynamic Permutation:

Select different permutation methods based on the key
Do not fix the use of in-order traversal

Add S-Boxes:

Add non-linear S-box substitution after Encode
Enhance resistance to differential and linear analysis

Conclusion

This problem is an excellent CTF cryptography challenge, covering:

Technical Points:

Reading and reversing x86 assembly
Mathematical modeling and algorithm analysis
Data structures (complete binary tree)
Basic concepts of cryptography

Problem-Solving Approach:

Static analysis: Understand encryption logic function by function
Mathematical modeling: Describe each step with formulas
Reverse design: Design verifiable reverse algorithms for each step
Code implementation: Python implementation of the complete process
Forward verification: Use plaintext encryption to verify decryption correctness

Key Insights:

The “in-place update” feature of Encode determines the reverse method
The arithmetic regularity of the padding provides uniqueness constraints
Details of console I/O affect plaintext content

Final Answer:

Plaintext (including carriage return): b'Th15_lS_@_easy_ASM\r'
Plaintext (visible):   Th15_lS_@_easy_ASM
Length:          19 bytes

This article ensures that each step has theoretical support and practical verification through the complete process of analysis → modeling → reverse → implementation → verification. I hope it helps readers gain a deeper understanding of the intricacies of this problem!

References:

MASM32 Programming
Complete Binary Trees and Tree Traversal
XOR Cryptography Applications
CTF Cryptography Problem Type Analysis

Author’s Note: All code and conclusions in this article can be reproduced and verified locally, and readers are welcome to practice!