Overview: Intel Assembly Using gcc

William T Krieger... Sep 2005

This page is a potpourri, supplemental. It attempts to describe portions of the process of writing Intel assembly code using gcc. This explanation is done in the context of our current class, specifically using our textbook. Through examples and brief explanations, I attempt to guide you through the (many) potholes of writing Intel assembly language using gcc.

1. Using gcc

Here are some notes on getting gcc, installing it and whatever:

Using gcc for Intel Assembler

2. Hello world example

Let's look at an example, good old "Hello world" with each line commented:

.text	# begin text segment
greet: .ascii "Hello world\0"	# define const label "greet"
.globl _main	# make _main global/visible

_main:	# define _main function start
pushl $greet	# push greet onto esp stack
call _printf	# call printf function
addl $4,%esp	# add 4 to esp, rem from stack

leave	# prepare to return
ret	# return

Hello world example

3. The basics

The syntactic style supported by gcc is generally known as the "AT&T style." It was developed in conjunction with Unix and pre-dates the Intel style used in our textbook and by Microsoft in its assembly language called Microsoft Assembly (MASM).

Comments start with a pound sign (#). C++ style comments (// and /* ... */) are also supported, but not after every statement type (like some of the pseudo-instructions). I would just stick with #.

As in C/C++, the first function is called main. The underscore (_main) is a convention adopted to avoid name conflicts and is used on many "system" function names.

3.1. Instruction format

Assembly programs contain assembly instructions and assembler directives called pseudo-instructions. Pseudo-instructions generally start with a dot (.) and have many functions. They do not translate directly into Intel assembler instructions, rather they guide the assembler in constructing correct machine code for your assembly code.

Assembly instructions have the format:

label:

opcode

operand-list

comment

Each of these fields is optional (blank lines are OK). Anything on a line after the pound sign (#) is ignored. Here are some examples:

greet: .ascii "Hello world" # define ascii label

addl $4,%esp # add 4 to SP register

3.2. Program segments

Programs have the following segments in memory with syntax in parens:

Text - the main section containing your code and constant values ( .text )
Data - contains initialized global variables ( .data )
Bss - contains un-initialized global variables ( .bss )

3.3. The stack

A section of memory is reserved for your use as a stack. The stack has two primary uses:

Storing and passing the values of function parameters
Storing the value of the local variables of a function

There and important conventions you need to know and follow regarding the stack as well:

The address of the top of the stack is maintained in register %esp.
The stack starts at a high memory address. As items are added to the stack, it grows downward. Because of this %esp decreases as the number of items on the stack grows and increases as the stack shrinks.
The pushl src instruction places the value in src on the stack and then decrements the stack pointer (%esp) by 4 bytes.
The popl dest instruction copies the top value on the stack to dest and then increments the stack pointer (%esp) by 4 bytes.
The stack pointer (%esp) can be moved an arbitrary number of bytes using the addl instruction.

Examples using the stack follow in the "Calling functions" and "Local variables" sections.

4. Arithmetic

Coming soon...

5. Calling functions

Calling a function is a three step process:

Place parameters to the function on the stack
Call the function using the call instruction
Adjust the stack pointer (%esp) to effectively remove the parameters from the stack

Here's an example of a call to a simple function called plus10():

    .text
    .globl _main
_main:
    pushl $100        # first arg is 100

    call plus10
    addl $4, %esp     # remove arg from stack
    ret

# Function: int plus10( int p1)
# Adds 10 to the parameter and then returns the value
plus10:
    movl 8(%esp),%eax   # eax = first parameter
    addl $10,%eax       # eax = eax + 10
    ret

Function call example

Please note the first line of plus10. The first 4-byte argument resides 8 bytes from the top of the stack. This is because the call instruction pushes the return address of the calling function on the stack before ceding control of the program to plus10. This return address is popped from the stack and used by the ret instruction.

The return value of a function is usually placed in the %eax register.

For functions with multiple arguments, each argument is placed on the stack in last-to-first order. For example, a call to foo( 7, 14) would look like this:

    pushl $14        # push arg 14
    pushl $7         # push arg 7
    call foo
    addl $8, %esp    # remove args from stack

6. Local variables

Local variables for a function (including main) are also allocated on the stack.

The use of local variables also introduces something called the frame pointer; it is stored in the %ebp register. The frame pointer makes using the stack (in %esp register) for local variables easier and also maintains the integrity of registers (namely the stack and the frame pointer) that may be used by calling or called procedures. Here are the steps involved in using local variables:

Push the current frame pointer value onto the stack... pushl %ebp
Save (well, move) the stack pointer value in the frame pointer... movl %esp, %ebp
Then, create and use local variables on the stack... see example below
Restore the old value of the stack pointer that we saved in the frame pointer... movl %ebp, %esp
Pop the old frame pointer value off the stack... popl %ebp

Yes, this will take a while to get sink in, but it is a rote process that you will use for each procedure.

.text

# _main() start
    .globl _main       # define _main externally
_main:
    call foo
    ret

foo:
    # preserve old & set new frame pointer (ebp)
    pushl %ebp         # save old ebp
    movl %esp, %ebp    # set new ebp

    # create a local var on stack, set it to 10
    subl $4, %esp      # sub 4 from esp to create local var
    movl $10, -4(%ebp) # set local var -4(%ebp)
    addl $4, %esp      # remove local var space

    # return old frame pointer and return
    movl %ebp, %esp
    popl %ebp
    ret

Local variables example

7. Global variables

Global variables are (usually) placed in the data section. The name of the global variable is specified using a label. The size of the variable is indicated by the pseudo-opcode specified:

.long - an 4 byte integer
.word - a 2 byte integer
.byte - a 1 byte quantity, char or integer
.space - allocates a number of bytes, such as storage for an array

Here are some examples:

    .data
x: .long 17   # long x = 17
y: .word 3    # short y = 3
ch: .byte 'W' # char ch = 'W'
z: .byte 1    # unsigned char z = 1
array: .space 20   # 20 bytes for array, for example 5 longs

These are also global variables with an initial value. Un-initialized global variables should be defined in the bss section.

8. Input/output

We will output data to the screen using a C function called printf(). The "Hello World" example in the beginning shows a call of printf().

We will read values from the keyboard using C function scanf(). It's format is similar to printf(). Here's an example:

# data section: global variables
.data
x: .long 0   # x = 0
y: .long 0   # y = 0

# text section: program constants
.text
LC0: .ascii "Enter 2 integer values:\0"
LC1: .ascii "%d%d\0"
LC2: .ascii "Two values are: %d, %d\12\0"

# _main() function
    .globl _main
_main:
    # printf( "Enter 2 integer values:");
    pushl $LC0
    call _printf
    addl $16,%esp

    # scanf( "%d%d", &x, &y);
    pushl $y
    pushl $x
    pushl $LC1
    call _scanf
    addl $16,%esp

    # printf( "Two values are: %d, %d", x, y)
    movl y,%eax
    pushl %eax
    movl x,%eax
    pushl %eax
    pushl $LC2
    call _printf
    addl $16,%esp

    leave
    ret

Calling scanf() example

You can query Microsoft Visual-C++ for a manual page on either printf or scanf. Here's another nice source:

printf documentation... http://cppreference.com/stdio/printf.html
scanf documentation... http://cppreference.com/stdio/scanf.html

9. C "calling conventions"

"The C library is nothing if not consistent, and that is its greatest virtue"

- Jeff Duntemann, "Assembly Language Step-by-step"

C library functions must be called using the following conventions:

Preserve registers - When you call a function, it may change register values. Therefore, if you wish to preserve the values in registers (like ebx, esp, ebp, esi, and edi), then you must push them onto the stack prior to the call and restore them after the call.
Return value - The return value of a function call is placed in register eax. If the return value is greater than 32 bits, up to 64 bits, then the lower 32 bits are in eax and the high 32 bits are in edx.
Parameter order - Parameters to a function are pushed onto the stack in reverse order. For example, a call to f( x, y, z) is done by first pushing z, then push y, and finally push x. Then you issue the call opcode.
After the call - Functions do not remove their parameters from the stack, so it is the caller's duty to clean-up upon a function return. As we have seen, the stack pointer (esp register) is fixed by adding the number of bytes to erase to its current value.

I recommend that you use these conventions with your own functions as well.

10. Differences: gcc and our text

There are some very fundamental differences between the assembly code format used in gcc and that used in our textbook.

gcc is case-sensitive. In general, use lower-case (thankfully) rather than the upper case in the text.
The order of arguments in dyadic instructions (those with two operands) is switched. In our text the order of operands to, for example, add is add dest, src. The order in gcc is add src, dest.
In gcc, many instructions have a suffix defining the size of the operands. For example, rather than a single add instruction, gcc supports addl, addw, and addb for long (4 bytes), word (2 bytes) and byte (1 byte) quantities.
Immediate values (constants) are prefixed with a $.
Registers are prefixed with a %.
Memory is referenced using parens.
The format of pseudo-instructions controlling assembly are different. For example gcc uses .globl rather than PUBLIC to designate a public symbol.

Here are some examples for these differences:

gcc style	textbook style	Notes
addl $4, %esp	ADD ESP, 4	gcc is lower case gcc constant is $4 gcc register is %esp gcc addl to add 4 bytes gcc order of operands is switched
.globl _main	PUBLIC _main	gcc uses different pseudo-instruction
subl $12(%ebp),%eax	SUB EAX [EBP+12]	gcc memory reference uses parens with index outside

11. Instructions

Selected (and hopefully most important) Intel assembly instructions on a separate page:

instructions.htm

12. Examples

Also, on a separate page, some tasty Intel assembly code examples:

examples.htm

13. More help

Prof. Muganda has a tutorial guide on Intel Assembly similar to this one, that you may enjoy even more:

muganda_GasManual.pdf