The AMD64 assembler is a slightly modified version of the 386
assembler, and as such shares most of the syntax. Two new prefixes,
.q
and .qa
, are provided to select 64-bit operand and
address sizes respectively. 64-bit sizes are the default, so normally
you only have to use the other prefixes. Also there are additional
register operands R8
-R15
.
The registers lack the ’e’ or ’r’ prefix; even in 64 bit mode,
rax
is called ax
. Additional register operands are
available to refer to the lowest-significant byte of all registers:
R8L
-R15L
, SPL
, BPL
, SIL
,
DIL
.
The Linux-AMD64 calling convention is to pass the first 6 integer
parameters in rdi, rsi, rdx, rcx, r8 and r9 and to return the result
in rax and rdx; to pass the first 8 FP parameters in xmm0–xmm7 and to
return FP results in xmm0–xmm1. So abi-code
words get the
data stack pointer in di
and the address of the FP stack
pointer in si
, and return the data stack pointer in ax
.
The other caller-saved registers are: r10, r11, xmm8-xmm15. This
calling convention reportedly is also used in other non-Microsoft OSs.
Windows x64 passes the first four integer parameters in rcx, rdx, r8 and r9 and return the integer result in rax. The other caller-saved registers are r10 and r11.
On the Linux platform, according to https://uclibc.org/docs/psABI-x86_64.pdf page 21 the registers AX CX DX SI DI R8 R9 R10 R11 are available for scratch.
The addressing modes for the AMD64 are:
\ running word A produces a memory error as the registers are not initialised ;-) ABI-CODE A ( -- ) 500 # AX MOV \ immediate DX AX MOV \ register 200 AX MOV \ direct addressing DX ) AX MOV \ indirect addressing 40 DX D) AX MOV \ base with displacement DX CX I) AX MOV \ scaled index DX CX *4 I) AX MOV \ scaled index 40 DX CX *4 DI) AX MOV \ scaled index with displacement DI AX MOV \ SP Out := SP in RET END-CODE
Here are a few examples of an AMD64 abi-code
words:
abi-code my+ ( n1 n2 -- n3 ) \ SP passed in di, returned in ax, address of FP passed in si 8 di d) ax lea \ compute new sp in result reg di ) dx mov \ get old tos dx ax ) add \ add to new tos ret end-code
\ Do nothing ABI-CODE aNOP ( -- ) DI ) AX LEA \ SP out := SP in RET END-CODE
\ Drop TOS ABI-CODE aDROP ( n -- ) 8 DI D) AX LEA \ SPout := SPin - 1 RET END-CODE
\ Push 5 on the data stack ABI-CODE aFIVE ( -- 5 ) -8 DI D) AX LEA \ SPout := SPin + 1 5 # AX ) MOV \ TOS := 5 RET END-CODE
\ Push 10 and 20 into data stack ABI-CODE aTOS2 ( -- n n ) -16 DI D) AX LEA \ SPout := SPin + 2 10 # 8 AX D) MOV \ TOS - 1 := 10 20 # AX ) MOV \ TOS := 20 RET END-CODE
\ Get Time Stamp Counter as two 32 bit integers \ The TSC is incremented every CPU clock pulse ABI-CODE aRDTSC ( -- TSCl TSCh ) RDTSC \ DX:AX := TSC $FFFFFFFF # AX AND \ Clear upper 32 bit AX 0xFFFFFFFF # DX AND \ Clear upper 32 bit DX AX R8 MOV \ Tempory save AX -16 DI D) AX LEA \ SPout := SPin + 2 R8 8 AX D) MOV \ TOS-1 := saved AX = TSC low DX AX ) MOV \ TOS := Dx = TSC high RET END-CODE
\ Get Time Stamp Counter as 64 bit integer ABI-CODE RDTSC ( -- TSC ) RDTSC \ DX:AX := TSC $FFFFFFFF # AX AND \ Clear upper 32 bit AX 32 # DX SHL \ Move lower 32 bit DX to upper 32 bit AX DX OR \ Combine AX wit DX in DX -8 DI D) AX LEA \ SPout := SPin + 1 DX AX ) MOV \ TOS := DX RET END-CODE
VARIABLE V \ Assign 4 to variable V ABI-CODE V=4 ( -- ) BX PUSH \ Save BX, used by gforth V # BX MOV \ BX := address of V 4 # BX ) MOV \ Write 4 to V BX POP \ Restore BX DI ) AX LEA \ SPout := SPin RET END-CODE
VARIABLE V \ Assign 5 to variable V ABI-CODE V=5 ( -- ) V # CX MOV \ CX := address of V 5 # CX ) MOV \ Write 5 to V DI ) AX LEA \ SPout := SPin RET END-CODE
ABI-CODE TEST2 ( -- n n ) -16 DI D) AX LEA \ SPout := SPin + 2 5 # CX MOV \ CX := 5 5 # CX CMP 0= IF 1 # 8 AX D) MOV \ If CX = 5 then TOS - 1 := 1 <-- ELSE 2 # 8 AX D) MOV \ else TOS - 1 := 2 THEN 6 # CX CMP 0= IF 3 # AX ) MOV \ If CX = 6 then TOS := 3 ELSE 4 # AX ) MOV \ else TOS := 4 <-- THEN RET END-CODE
\ Do four loops. Expect : ( 4 3 2 1 -- ) ABI-CODE LOOP4 ( -- n n n n ) DI AX MOV \ SPout := SPin 4 # DX MOV \ DX := 4 loop counter BEGIN 8 # AX SUB \ SP := SP + 1 DX AX ) MOV \ TOS := DX 1 # DX SUB \ DX := DX - 1 0= UNTIL RET END-CODE
Here’s a AMD64 example that deals with FP values:
abi-code my-f+ ( r1 r2 -- r ) \ SP passed in di, returned in ax, address of FP passed in si si ) dx mov \ load fp 8 dx d) xmm0 movsd \ r2 dx ) xmm0 addsd \ r1+r2 xmm0 8 dx d) movsd \ store r 8 # si ) add \ update fp di ax mov \ sp into return reg ret end-code