In the last part of this three part set of tutorials, we successfully wrote a small function in arm64 assembly language and called it from a C program. In this tutorial we will write a program written only in arm64 assembly, and make it output "Hello, world!" using standard C library functions.
First steps
To start with, follow the steps from part 1 to set up a correctly configured project in Visual Studio.
In order to understand what happens next we need to understand a little about how Windows runs executables. There are several subsystems in Windows which run executables in different ways. The one we will target is a console application which runs on the command line.
The Windows linker needs to be told what type of application exists in the binary code files it is given. For the project we've created, the linker will be told to create a console application with \SUBSYSTEM:CONSOLE
(see documentation here which explains some of the other subsystems).
By examining the documentation for entry point, we can find the first function that Windows is looking for in our program. By default, Windows looks to run a function that does some set up for the C runtime (to allow all C standard functions to work). This function is called mainCRTStartup
, and once it is finished it calls the traditional C entry point of main
.
Let's start by creating the smallest possible program that runs and stops.
Make an assembly file called Hello.asm
and insert into it the following code.
; Hello World in ARM64 Assembly for Windows
AREA Hello, CODE, READONLY
EXPORT mainCRTStartup [FUNC]
IMPORT ExitProcess
mainCRTStartup PROC
mov x0, #0
bl ExitProcess
ENDP
END
Here we make our first use of the Windows API, with the ExitProcess function. In general to make use of a library function we must:
- Ensure the linker knows that this file needs a label from elsewhere (in a library most often) using
IMPORT
directive. - Lookup what arguments it needs, by inspecting the documentation and adhering to these.
- Call the function in code, by adhering to the ARM Architecture Procedure Call Standard (placing arguments into registers in order, then calling with
bl
).
Relevant sections of armasm user guide / Microsoft documentation
Declaring some data
Assembly language doesn't have any special ideas about how a programmer stores information beyond the concept of memory and registers. How we use memory is subject to a set of conventions, some of which are so core that the CPU provides additional support for them.
We can store data in two different ways in a program
- Registers - in arm64, we can use
x0-x30
as general purpose, being mindful of the ABI conventions which mean functions we call may overwrite our contents if we don't save it. -
Memory - here we use the much slower RAM, and store data in different parts of this. So long as we don't try to read or write to memory that isn't ours, we can store data any way we like. That said, the three principle ways are:
- Globals - we reserve some memory at the top of our program, near the machine code and mark it as read/write. We can have the assembler initialise the values at compile time. Very useful for constants or global state.
- Stack - this is a convention honoured by many processors (arm64 and x86 amongst them) where there is a dynamic portion of memory that grows and shrinks as the program executes. On arm64, it starts at the highest address and grows downwards (to lower addresses) and shrinks back upwards. The current top of the stack is indicated by the stack pointer (on arm64,
sp
). - Heap - here we make use of the Operating System to assign us a portion of memory of a certain size and give us a pointer to it which we can store on the stack or in a register. In the C standard library we would use
malloc
for this.
In this tutorial (and the next part) we will demonstrate both Globals and Stack usage.
Globally
To start with, we will demonstrate use of a global to store some data:
; Hello World in ARM64 Assembly for Windows
AREA HelloData, DATA
helloText DCB "Hello, world!",0
AREA Hello, CODE, READONLY
EXPORT mainCRTStartup [FUNC]
IMPORT ExitProcess
mainCRTStartup PROC
mov x0, #0
bl ExitProcess
ENDP
END
We create a new AREA
for data and store a set of bytes (the B
in DCB
stands for bytes). The contents of these bytes are the null-terminated (hence the 0
) ASCII string for "Hello, world!".
Note how easy the assembler makes it for us to define a string constant.
Stack
We'll now see why string constants are much easier to define globally in armasm
- this is because the work is done at assembly time, rather than at runtime, and because the syntax for doing so is much easier!
To use the stack the same code looks like this (comments for clarity, with X
signifying an unknown byte value):
; Hello World in ARM64 Assembly for Windows
AREA Hello, CODE, READONLY
EXPORT mainCRTStartup [FUNC]
IMPORT ExitProcess
mainCRTStartup PROC
sub sp, sp, #16 ; stack must be 16-byte aligned and we need 14 bytes for "Hello, world!\0"
; arm64 is little endian, so least significant byte will be stored into memory first (hence we put string in reverse order in bytes)
movk x0, #0x6548
movk x0, #0x6C6C, LSL #16
movk x0, #0x2C6F, LSL #32
movk x0, #0x7720, LSL #48
movk x1, #0x726F
movk x1, #0x646C, LSL #16
movk x1, #0x0021, LSL #32
stp x0, x1, [sp, #0]
add sp, sp, #16
mov x0, #0
bl ExitProcess
ENDP
END
One of the arm64 requirements is that the stack pointer must be 16-byte aligned, so we can only increase and decrease the size of stack in 16-byte increments. The code works as follows:
- Increase the size of the stack by 16 bytes (
sub sp, sp, #16
) - Set the registers
x0
andx1
to house "Hello, w" and "orld!\0" respectively (the variousmovk
commands are setting 2 bytes at time of the registers) - Store the pair of registers to memory at the address pointed to by the stack pointer (
stp x0, x1, [sp, #0]
) - Once done, decrease the size of the stack by 16 bytes (
add sp, sp, #16
)
To go into even more detail we can look at it line by line and get the below with the state reflecting the state after the instruction has run. You'll notice that we're storing the string in reverse order into x0
and x1
. This is because arm64 is little endian, meaning the least significant byte is read first in any instruction.
Instruction |
x0
|
x1
|
Memory state at sp
|
sub sp, sp, #16 |
0x________________ |
0x________________ |
__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ |
movk x0, #0x6548 |
0x____________6548 ("eH") |
0x________________ |
__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ |
movk x0, #0x6C6C, LSL #16 |
0x________6C6C6548 ("lleH") |
0x________________ |
__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ |
movk x0, #0x2C6F, LSL #32 |
0x____2C6F6C6C6548 (",olleH") |
0x________________ |
__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ |
movk x0, #0x7720, LSL #48 |
0x77202C6F6C6C6548 ("w ,olleH") |
0x________________ |
__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ |
... |
... |
... |
... |
movk x1, #0x0021, LSL #32 |
0x77202C6F6C6C6548 ("w ,olleH") |
0x____0021646C726F ("\0!dlro") |
__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ |
stp x0, x1, [sp, #0] |
0x77202C6F6C6C6548 ("w ,olleH") |
0x____0021646C726F ("\0!dlro") |
48 65 6C 6C 6F 2C 20 77 6F 72 6C 64 21 00 __ __ ("Hello, world!\0") |