Pentium Assembler Using gcc

This page is a potpourri. It attempts to describe portions of the process of writing Pentium assembly code using gcc. This explanation is done in the context of our current class, specifically using our textbook. Through examples and brief explanations, I attempt to guide you through the potholes of writing Pentium assembly language using gcc.

1. Hello world example

Let's look at an example, good old "Hello world" with each line commented:

        .text # begin text segment
greet:  .ascii "Hello world\0" # define const label "greet"
        .globl _main # make _main global/visible
 
_main:   # define _main function start
        pushl $greet # push greet onto esp stack
        call _printf # call printf function
        addl $4,%esp # add 4 to esp, rem from stack
 
        leave # prepare to return
        ret # return
Hello world example

2. The basics

The syntactic style supported by gcc is generally known as the "AT&T style." It was developed in conjunction with Unix and pre-dates the Intel style used in our textbook and by Microsoft in its assembly language called Microsoft Assembly (MASM).

Comments start with a pound sign (#). C++ style comments (// and /* ... */) are also supported, but not after every statement type (like some of the pseudo-instructions). I would just stick with #.

As in C/C++, the first function is called main. The underscore (_main) is added to avoid name conflicts.

2.1. Instruction format

Assembly programs contain assembly instructions and assembler directives called pseudo-instructions. Pseudo-instructions generally start with a dot (.) and have many functions. They do not translate directly into Pentium assembler instructions, rather they guide the assembler in constructing correct machine code for your assembly code.

Assembly instructions have the format:

label: opcode operand-list comment

Each of these fields is optional (blank lines are OK). Anything on a line after the pound sign (#) is ignored. Here are some examples:

greet:   .ascii "Hello world"    # define ascii label
         addl $4,%esp            # add 4 to the SP register

2.2. Program segments

Programs have the following segments in memory with syntax in parens:

  1. Text - the main section containing your code and constant values ( .text )
  2. Data - contains initialized global variables ( .data )
  3. Bss - contains un-initialized global variables ( .bss )

2.3. The stack

A section of memory is reserved for your use as a stack. The stack has two primary uses:

Each of these cases is described below, but the stack functions 

Examples using the stack follow in the "Calling functions" and "Local variables" sections.

3. Arithmetic

Coming soon...

4. Calling functions

Calling a function is a three step process:

  1. Place parameters to the function on the stack
  2. Call the function using the call instruction
  3. Adjust the stack pointer (%esp) to effectively remove the parameters from the stack

Here's an example of a call to a simple function called plus10():

    .text
    .globl _main
_main:
    pushl $100        # first arg is 100
    call plus10 
    addl $4, %esp     # remove arg from stack
    ret

# Function: int plus10( int p1)
# Adds 10 to the parameter and then returns the value
plus10:
    movl 8(%esp),%eax   # eax = first parameter 8(%ebp)
    addl $10,%eax       # eax = eax + 10
    ret
Function call example

Please note the first line of plus10. The first 4-byte argument resides 8 bytes from the top of the stack. This is because the call instruction pushes the return address of the calling function on the stack before ceding control of the program to plus10.

The return value of a function is usually placed in the %eax register.

For functions with multiple arguments, each argument is placed on the stack in last-to-first order.  For example, a call to foo( 7, 14) would look like this:

    pushl $14        # push arg 14
    pushl $7         # push arg 7
    call foo
    addl $8, %esp    # remove args from stack

5. Local variables

Local variables for a function (including main) are allocated on the stack. There are 3 things to know here:

.text

# _main() start
    .globl _main       # define _main externally
_main:
    call foo
    ret

foo:
    # preserve old & set new frame pointer (ebp)
    pushl %ebp         # save old ebp
    movl %esp, %ebp    # set new ebp

    # create a local var on stack, set it to 10
    subl $4, %esp      # sub 4 from esp to create local var
    movl $10, -4(%ebp) # set local var -4(%ebp)
    addl $4, %esp      # remove local var space

    # return old frame pointer and return 
    movl %ebp, %esp
    popl %ebp
    ret
Local variables example

6. Global variables

Global variables are (usually) placed in the data section. The name of the global variable is specified using a label. The size of the variable is indicated by the pseudo-opcode specified:

Here are some examples:

    .data
x:  .long 17   # long x = 17
y:  .word 3    # short y = 3
ch: .byte 'W'  # char ch = 'W'
z:  .byte 1    # unsigned char z = 1
array: .space 20   # 20 bytes for array, for example 5 longs

These are also global variables with an initial value. Un-initialized global variables should be defined in the "bss" section.

7. Input/output

We will output data to the screen using a C function called printf(). You can query Visual-C++ for a manual page on this function. Here's an example usage:

    .text
greet:  .ascii "Hello world\0"
    .globl _main
_main:
    pushl $greet
    call _printf
    addl $4,%esp
    leave
    ret
Calling printf() example

We will read values from the keyboard using C function scanf(). It's format is similar to printf(). Here's an example:

// data section: global variables
.data
x: .long 0   # x = 0
y: .long 0   # y = 0
 
// text section: program constants
.text
LC0: .ascii "Enter 2 integer values:\0"
LC1: .ascii "%d%d\0"
LC2: .ascii "Two values are: %d, %d\12\0"
 
// _main() function
    .globl _main
_main:
    // printf( "Enter 2 integer values:");
    pushl $LC0
    call _printf
    addl $16,%esp
 
    // scanf( "%d%d", &x, &y);
    pushl $y
    pushl $x
    pushl $LC1
    call _scanf
    addl $16,%esp

    // printf( "Two values are: %d, %d", x, y)
    movl y,%eax
    pushl %eax
    movl x,%eax
    pushl %eax
    pushl $LC2
    call _printf
    addl $16,%esp

    leave
    ret
Calling scanf() example

8. C "calling conventions"

"The C library is nothing if not consistent, and that is its greatest virtue"

- Jeff Duntemann, "Assembly Language Step-by-step"

C library functions must be called using the following conventions:

I recommend that you use these conventions with your own functions as well.

9. Differences: gcc and our text

There are some very fundamental differences between the assembly code format used in gcc and that used in our textbook.

Here are some examples for these differences:

gcc style textbook style Comments
addl $4, %esp ADD ESP, 4 gcc is lower case

gcc constant is $4

gcc register is %esp

gcc addl to add 4 bytes

gcc order of operands is switched

.globl _main PUBLIC _main gcc uses different pseudo-instruction
subl $12(%ebp),%eax SUB EAX [EBP+12] gcc memory reference uses parens with index outside

10. Instructions

Selected (and hopefully most important) instructions on a separate page:

instructions.htm

11. Examples

Also, on a separate page, some tasty assembly code examples:

examples.htm