X86汇编笔记-memetao

###notes

在8086处理器上,如果要用寄存器来提供偏移地址,只能使用BX,si,di,bP,不能使用其它寄存器.以下指令是非法的: mov [ax], dl
8086处理器只支持以下几种基地址寄存器和变址寄存器的组合:[bx+si] [bx+di] [bp+si] [bp+di]

条件转移指令

js 当SF == 1则跳转 , jns与之相反
jz 当zF == 1则跳转, jnz与之相反
jo 当OF == 1则跳转, jno与之相反
jc 当CF == 1则跳转, jnc与之相反
jp 当PF == 1则跳转, jnp与之相反

常用指令

movsb | movsw

DS:SI --> ES:DI
DF = 0 low->high
DF = 1 high->low

instruction 'cld' can clear df

div A / B. A是被除数, B是为除数.

被除数A默认存在AX中, 或者AX和DX中(DX存高x位)

如果除数是8位，那么除法的结果AL保存商，AH保存余数.

如果除数16位，那么除法的结果 AX保存商，DX保存余数。

cli && sti 清除和设置IF(中断标志位)
repe/repne 相等(ZF==0)/不相等的时候重复
scasb(w|d) 两个对比数分别为 EAX / AX / AL 和 DS:EDI, 并且将edi-+1（2|4)

ret

ret 弹栈到IP寄存器
retf 弹栈到IP,再弹栈到CS
iret 弹栈到IP,再弹栈到CS,再弹栈到FLAGS

call 指令

call 指令有三种形式:

第一种是16位相对近调用, 近调用的意思是被调用的目标过程位于当前代码段内,而非另一个不同的代码段,所以只需要得到偏移地址即可.

16位相对近调用是三字节指令, 操作码为0xE8. 在指令执行阶段, 处理器看到操作码0xE8, 就知道它应当调用一个过程. 于是,它用指令指针寄存器IP的当前内容加上指令中的操作数, 再加上3, 得到一个新的偏移地址. 接着将IP的原有内容压入栈. 最后,用刚才的偏移地址取代IP原油的内容. 这直接导致处理器的执行流程转移到目标位置处.

第二种是16位间接绝对近调用, 这种调用也是近调用, 只能调用当前代码段内的过程, 指令中的操作数不是偏移量, 而是被调用过程的真实偏移地址, 故称绝对地址. 这个地址不是直接出现在指令中,而是由16位的通用寄存器或者16位的内存单元简介给出.

第三种是16位直接绝对远调用, 调用另一个代码段内的过程. 比如:

call 0x20000:0x0030

如果被调用过程处于当前代码段, 也没关系.

第四种是16位间接绝对远调用, 这也属于段间调用,被调用过程属于另一个代码段. 比如:

call far [0x2000]
call far [bx]

间接远调用必须使用far.

中断

实模式下,处理器要求中断向量表需要存放在物理地址0x00000->0x003ff,同1k的空间内,共256个中断,每个中断向量4个字节.

0x00000 ... -> ....0x003ff
---------------------------------------------
偏移地址(2字节) | 段地址(2字节)    | 偏移地址..
---------------------------------------------

内联汇编

常用约束

r : Register(s)
a: %eax, %ax, %al
b: %ebx, %bx, %bl
c: %ecx, %cx, %cl
d: %edx, %dx, %dl
S: %esi, %si
D: %edi, %di
i: 直接操作数
=: 只写

函数调用寄存器约定

rax - temporary register; when we call a syscal, rax must contain syscall number
rdi - used to pass 1st argument to functions
rsi - pointer used to pass 2nd argument to functions
rdx - used to pass 3rd argument to functions
rcx - fourth argument
r8 - fifth argument
r9 - sixth
stack - over sixth

待整理

栈

RBP is the base pointer register. It points to the base of the current stack frame.
RSP is the stack pointer, which points to the top of current stack frame.

two operators:

push argument - increments stack pointer (RSP) and stores argument in location pointed by stack pointer
pop argument - copied data to argument from location pointed by stack pointer

段

data - section is used for declaring initialized data or constants
bss - section is used for declaring non initialized variables
text - section is used for code

以sys_write为例:

size_t sys_write(unsigned int fd, const char * buf, size_t count);

section .data
    msg db      "hello, world!" ; db代表声明单字节类型的

section .text
    global _start
_start:
    mov     rax, 1 ; sys_write调用号
    mov     rdi, 1 ; stdcout
    mov     rsi, msg
    mov     rdx, 13
    syscall
    mov    rax, 60
    mov    rdi, 0
    syscall

简单程序示例

section .data
    ; Define constants
    num1:   equ 100
    num2:   equ 50
    ; initialize message
    msg:    db "correct"

section .text

    global _start

;; entry point
_start:
    ; set num1's value to rax
    mov rax, num1
    ; set num2's value to rbx
    mov rbx, num2
    ; get sum of rax and rbx, and store it's value in rax
    add rax, rbx
    ; compare rax and 150
    cmp rax, 150
    ; go to .exit label if rax and 150 are not equal
    jne .exit
    ; go to .rightSum label if rax and 150 are equal
    jmp .rightSum

; Print message that sum is correct
.rightSum:
    ;; write syscall
    mov     rax, 1
    ;; file descritor, standard output
    mov     rdi, 1
    ;; message address
    mov     rsi, msg
    ;; length of message
    mov     rdx, 8
    ;; call write syscall
    syscall
    ; exit from program
    jmp .exit

; exit procedure

    ; exit syscall
    mov    rax, 60
    ; exit code
    mov    rdi, 0
    ; call exit syscall
    syscall

声明

initialized:

equ(equate) similar to C’s define
db define and allocate a byte,
dw,dd,dt,do,dy and dz

non initialized variables:

RESB, RESW, RESD, RESQ, REST, RESO, RESY and RESZ

简单程序示例

%macro PRINT 1
    pusha
    pushf
    jmp %%astr
%%str db %1, 0
%%strln equ $-%%str
%%astr: _syscall_write %%str, %%strln
popf
popa
%endmacro

%macro _syscall_write 2
	mov rax, 1
        mov rdi, 1
        mov rsi, %%str
        mov rdx, %%strln
        syscall
%endmacro

AT&T 语法

.data
    // initialized data definition
.text
    .global _start

_start:
    // main routine

data definition

.section .data
    // 1 byte
    var1: .byte 10
    // 2 byte
    var2: .word 10
    // 4 byte
    var3: .int 10
    // 8 byte
    var4: .quad 10
    // 16 byte
    var5: .octa 10

    // assembles each string (with no automatic trailing zero byte) into consecutive addresses
    str1: .asci "Hello world"
    // just like .ascii, but each string is followed by a zero byte
    str2: .asciz "Hello world"
    // Copy the characters in str to the object file
    str3: .string "Hello world"

oprands order

    ;move source, destination
    movw $10, %ax

GNU assembler has 6 postfixes for operations:

b - 1 byte operands
w - 2 bytes operands
l - 4 bytes operands
q - 8 bytes operands
t - 10 bytes operands
o - 16 bytes operands

This rule is ot only mov instruction, but also for all another like addl, xorb, cmpw and etc…