Assembly Language Day 06

0x00

This section is dedicated to daily learning and note sharing to help everyone learn assembly language. Why learn assembly language? Because in red-blue confrontations, our tools are often detected and killed by some AV/EDR. Therefore, we need to counter AV, which is the evasion technique. To learn evasion techniques, we must start from the basics. In the future, I may also share some notes on C++, PE file structures, etc. Additionally, I might introduce knowledge related to reverse engineering.

0x01

6. More Flexible Methods for Locating Memory Addresses

Previously, we used the methods [0] and [bx] to locate the addresses of memory units in memory access instructions.

AND and OR Instructions

AND instruction: logical AND instruction, performs bitwise AND operation.

For example, the instruction:

mov al,01100011B
and al,00111011B

After execution, al=00111011B, this instruction can set the corresponding bit of the operand to 0 while keeping other bits unchanged. For example:

  • The instruction to set the sixth bit of al to 0 is and al,10111111B
  • The instruction to set the seventh bit of al to 0 is and al,01111111B
  • The instruction to set the eighth bit of al to 0 is and al,11111110B

OR instruction: logical OR instruction, performs bitwise OR operation.

For example, the instruction:

mov al,01100011B
or  al,00111011B

After execution, al=01111011B, this instruction can set the corresponding bit of the operand to 1 while keeping other bits unchanged. For example:

  • The instruction to set the sixth bit of al to 1 is or al,01000000B
  • The instruction to set the seventh bit of al to 1 is or al,10000000B
  • The instruction to set the eighth bit of al to 1 is or al,00000001B

About ASCII Code

During text editing, encoding and decoding are performed according to ASCII encoding rules. When we press the ‘a’ key on the keyboard, the information is sent to the computer, encoded according to ASCII rules, and stored as 61H in a specified space in memory. The text editing software retrieves 61H from memory, sends it to the video memory, and the graphics card interprets the information in video memory using ASCII rules as the character ‘a’, which the graphics card driver displays on the screen.

Data Given in Character Form

We can indicate that data is given in character form in the assembly program using the ‘……’ method, and the compiler will convert them into corresponding ASCII codes, as shown in the following program:

assume cs:code,ds:data

data segment
  db 'unIX'
  db 'foRK'
data ends

code segment

start:
  mov al,'a'
  mov bl,'b'
  mov ax,4c00h
  int 21h
code ends
end start

In the above source program, db unIX is equivalent to db 75h,6eh,49h,58h, and mov bl,’b’ is equivalent to mov bl,62h

Using the d command to view the data segment, since ds=0B2D, the program starts from segment 0B3DH, and since the data segment is the first segment in the program, its segment address is 0B3DH.

Using the d command to view the data segment, debug displays its contents in hexadecimal and ASCII character format.

Case Conversion Issues

Consider the following problem: fill in the code in codesg to convert the first character in datasg to uppercase and the second string to lowercase.

assume cs:codesg,ds:datasg

datasg segment
  db 'BaSiC'
  db 'iNf0rMaTi0n'
datasg ends

codesg segment
  start:
    mov ax,14c00h
    int 21h
codesg ends
end start

The ASCII codes for the same letter in uppercase and lowercase are different, with the ASCII value of lowercase letters being 20H greater than that of uppercase letters.

However, to convert characters in the program between cases, we first need to determine their case. Taking BaSiC as an example:

assume cs:codesg,ds:datasg

datasg segment
  db 'BaSiC'
  db 'iNf0rMaTi0n'
datasg ends

codesg segment
  start:
    mov ax,datasg
    mov ds,ax
    mov bx,0
    mov cs,5
  s:
    mov al,[bx]
    ; if (al) > 61h, it is the ASCII code of a lowercase letter, then sub al, 20h
    mov [bx],al
    inc bx
    loop s
    mov ax,14c00h
    int 21h
codesg ends
end start

The judgment here will use some instructions that we have not yet learned, so we consider other methods.

From the binary form of ASCII codes, except for the fifth bit (counting from 0), the other bits of uppercase and lowercase letters are the same, with uppercase being 0 and lowercase being 1. Therefore, we can directly change the fifth bit without judging the case.

assume cs:codesg,ds:datasg

datasg segment
  db 'BaSiC'
  db 'iNf0rMaTi0n'
datasg ends

codesg segment
  start:
    mov ax,datasg
    mov ds,ax
    mov bx,0
    mov cs,5
  s:
    mov al,[bx]
    and al,11011111B    ; Clear the fifth bit to convert to uppercase
    mov [bx],al
    inc bx
    loop s

    mov bx,5
  s0:
    mov al,[bx]
    or al,00100000B     ; Set the fifth bit to 1 to convert to lowercase
    mov [bx],al
    inc bx
    loop s0

    mov ax,14c00h
    int 21h
codesg ends
end start

[bx+idata]

Previously, we used [BX] to indicate a memory unit, and we can also use [bx+idata] to indicate a memory unit, where its offset address is (bx)+idata.

For example, the instruction mov ax,[bx+200] indicates that the content of a memory unit is sent to ax, where the length of this memory unit is 2 bytes, storing a word, and the offset address is the value in bx plus 200, with the segment address in ds, described numerically as (ax) = ((ds)*16+(bx)=200).

This instruction can also be written as mov ax,[200+bx],mov ax,200[bx], mov ax,[bx].200

Processing Arrays Using [bx+idata]

Requirement: Convert the first string defined in datasg to uppercase and the second string to lowercase.

assume cs:codesg,ds:datasg
datasg segment
   db 'BaSiC'
   db 'MinIX'
datasg ends
codesg segment
   start: 
     mov ax,datasg
     mov ds,ax
     mov bx,0
     mov cx,5
   s:  
     mov al,[bx]
     and al,11011111b
     mov [bx],al
     mov al,[5+bx]
     or al,00100000b
     mov [5+bx],al
     inc bx
     loop s
 codesg ends
 end start

SI and DI

SI and DI are registers in the 8086 that are similar in function to bx. SI and DI cannot be divided into two 8-bit registers. The following three sets of instructions achieve the same functionality:

mov bx,0
mov ax,[bx]

mov si,0
mov ax,[si]

mov di,0
mov ax,[di]

The following three sets of instructions also achieve the same functionality:

mov bx,0
mov ax,[bx+123]

mov si,0
mov ax,[si+123]

mov di,0
mov ax,[di+123]

[bx+si] and [bx+di]

Previously, we used [bx(si or di)] or [bx(si or di)+idata] to indicate a memory unit, and we can also use a more flexible method:[bx+si] and [bx+di]

[bx+si] indicates a memory unit at the offset address (bx)+(si), with a length of bytes.

mov ax,[bx+si] indicates (ax) = ((ds)*+(bx)+(si)), this instruction can also be written as mov ax,[bx][si]

[bx+si+idata] and [bx+di+idata]

[bx+si+idata] indicates a memory unit, with an offset address of (bx)+(si)+idata, the instruction mov ax,[bx+si+idata] means (ax) = ((ds)*16+(bx)+(si)+idata), this instruction can also be written as:

mov ax,[bx+200+si]
mov ax,[200+bx+si]
mov ax,200[bx][si]
mov ax,[bx].200[si]
mov ax,[bx][si].200

Flexible Application of Different Addressing Methods

If we compare the several methods used for locating memory (addressing methods) mentioned earlier, we can find:

  • [idata] uses a constant to represent an address, which can be used to directly locate a memory unit.
  • [bx] uses a variable to represent a memory address, which can be used to indirectly locate a memory unit.
  • [bx+idata] uses a variable and a constant to represent an address, which can indirectly locate a memory unit based on a starting address.
  • [bx+si] uses two variables to represent an address.
  • [bx+si+idata] uses two variables and a constant to represent an address.

Programming: Capitalize the first letter of each word in the datasg segment.

assume cs:codesg,ds:datasg
datasg segment
   db '1. file  '  
   db '2. edit  '
   db '3. search '
   db '4. view   '
   db '5. options '
   db '6. help  '
datasg ends
codesg segment
 start: 
   mov ax,datasg
   mov ds,ax
   mov bx,0
   mov cx,6
 s:  
  mov al,[bx+3]
  and al,11011111b
  mov [bx+3],al
  add bx,16
  loop s 

  mov ax,4c00h
  int 21h
codesg ends
 end start

0x02

Previous Notes:

Assembly Language Day 05

Assembly Language Day 04

Assembly Language Day 03

Assembly Language Day 02

Assembly Language Day 01

Basic Knowledge of Assembly Language

Assembly Language Day 06ShareAssembly Language Day 06CollectAssembly Language Day 06LookingAssembly Language Day 06Like

Assembly Language Day 06

Scan to Follow UsBecome an Excellent Network Security Guard

Leave a Comment