Pentium Assembler Using gcc
This page is a potpourri. It attempts to describe portions of the process of
writing Pentium assembly code using gcc. This explanation is done in the context
of our current class, specifically using our textbook. Through examples and
brief explanations, I attempt to guide you through the potholes of writing
Pentium assembly language using gcc.
1. Hello world example
Let's look at an example, good old "Hello world" with each line
commented:
.text |
# begin text
segment |
greet:
.ascii "Hello world\0" |
# define const
label "greet" |
.globl _main |
# make _main
global/visible |
|
|
_main: |
# define
_main function start |
pushl $greet |
# push greet
onto esp stack |
call _printf |
# call
printf function |
addl $4,%esp |
# add 4 to
esp, rem from stack |
|
|
leave |
# prepare to
return |
ret |
# return |
|
Hello world example
|
2. The basics
The syntactic style supported by gcc is generally known as the "AT&T
style." It was developed in conjunction with Unix and pre-dates the Intel
style used in our textbook and by Microsoft in its assembly language called
Microsoft Assembly (MASM).
Comments start with a pound sign (#). C++ style comments (// and /* ...
*/) are also supported, but not after every statement type (like some of the
pseudo-instructions). I would just stick with #.
As in C/C++, the first function is called main. The underscore (_main) is
added to avoid name conflicts.
2.1. Instruction format
Assembly programs contain assembly instructions and assembler directives
called pseudo-instructions. Pseudo-instructions generally start with a
dot (.) and have many functions. They do not translate directly into Pentium
assembler instructions, rather they guide the assembler in constructing correct
machine code for your assembly code.
Assembly instructions have the format:
label: |
opcode |
operand-list |
comment |
Each of these fields is optional (blank lines are OK). Anything on a line
after the pound sign (#) is ignored. Here are some
examples:
greet:
.ascii "Hello world" # define ascii label |
addl $4,%esp
# add 4 to the SP register |
2.2. Program segments
Programs have the following segments in memory with syntax in parens:
- Text - the main section containing your code and constant values ( .text
)
- Data - contains initialized global variables ( .data )
- Bss - contains un-initialized global variables ( .bss )
2.3. The stack
A section of memory is reserved for your use as a stack. The stack has two
primary uses:
- Storing and passing the values of function parameters
- Storing the value of the local variables of a function
Each of these cases is described below, but the stack functions
- The address of the top of the stack is maintained in register %esp.
- The stack starts at a high memory address. As items are added to the
stack, it grows downward. Because of this %esp decreases as the
number of items on the stack grows and increases as the stack shrinks.
- The pushl src instruction places the value in src on the
stack and then decrements the stack pointer (%esp) by 4 bytes.
- The popl dest instruction copies the top value on the stack to dest
and then increments the stack pointer (%esp) by 4 bytes.
- The stack pointer (%esp) can be moved an arbitrary number of bytes
using the addl instruction.
Examples using the stack follow in the "Calling functions" and
"Local variables" sections.
3. Arithmetic
Coming soon...
4. Calling functions
Calling a function is a three step process:
- Place parameters to the function on the stack
- Call the function using the call instruction
- Adjust the stack pointer (%esp) to effectively remove the
parameters from the stack
Here's an example of a call to a simple function called plus10():
- .text
.globl _main
_main:
pushl $100
# first arg is 100
- call plus10
addl $4, %esp # remove arg
from stack
ret
# Function: int plus10( int p1)
# Adds 10 to the parameter and then returns the value
plus10:
movl 8(%esp),%eax # eax = first parameter 8(%ebp)
addl $10,%eax
# eax = eax + 10
ret
|
Function call example
|
Please note the first line of plus10. The first 4-byte argument
resides 8 bytes from the top of the stack. This is because the call
instruction pushes the return address of the calling function on the stack
before ceding control of the program to plus10.
The return value of a function is usually placed in the %eax register.
For functions with multiple arguments, each argument is placed on the stack
in last-to-first order. For example, a call to foo( 7, 14) would
look like this:
- pushl $14
# push arg 14
pushl $7
# push arg 7
call foo
addl $8, %esp # remove
args from stack
|
5. Local variables
Local variables for a function (including main) are allocated on the stack.
There are 3 things to know here:
- Adjust the stack pointer (held in %esp) to create space on the
stack for the local variable
- Use that space
- Return the stack pointer, eliminating the local variable space
- .text
# _main() start
.globl _main # define _main externally
_main:
call foo
ret
foo:
# preserve old & set new frame pointer (ebp)
pushl %ebp
# save old ebp
movl %esp, %ebp # set new ebp
# create a local var on stack, set it to 10
subl $4, %esp # sub 4 from esp to create local var
movl $10, -4(%ebp) # set local var -4(%ebp)
addl $4, %esp # remove local var space
# return old frame pointer and return
movl %ebp, %esp
popl %ebp
ret
|
Local variables example |
6. Global variables
Global variables are (usually) placed in the data section. The name of the
global variable is specified using a label. The size of the variable is
indicated by the pseudo-opcode specified:
- .long - an 4 byte integer
- .word - a 2 byte integer
- .byte - a 1 byte quantity, char or integer
- .space - allocates a number of bytes, such as storage for an array
Here are some examples:
.data
x: .long 17 # long x = 17
y: .word 3 # short y = 3
ch: .byte 'W' # char ch = 'W'
z: .byte 1 # unsigned char z = 1
array: .space 20 # 20 bytes for array, for example 5 longs
|
These are also global variables with an initial value. Un-initialized global
variables should be defined in the "bss" section.
7. Input/output
We will output data to the screen using a C function called printf(). You can
query Visual-C++ for a manual page on this function. Here's an example usage:
.text |
greet: .ascii
"Hello world\0" |
.globl _main |
_main: |
pushl $greet |
call _printf |
addl $4,%esp |
leave |
ret |
|
Calling printf() example
|
We will read values from the keyboard using C function scanf(). It's format
is similar to printf(). Here's an example:
- // data section: global variables
.data
x: .long 0 # x = 0
y: .long 0 # y = 0
// text section: program constants
.text
LC0: .ascii "Enter 2 integer values:\0"
LC1: .ascii "%d%d\0"
LC2: .ascii "Two values are: %d, %d\12\0"
// _main() function
.globl _main
_main:
// printf( "Enter 2 integer values:");
pushl $LC0
call _printf
addl $16,%esp
// scanf( "%d%d", &x, &y);
pushl $y
pushl $x
pushl $LC1
call _scanf
addl $16,%esp
// printf( "Two values are: %d, %d", x, y)
movl y,%eax
pushl %eax
movl x,%eax
pushl %eax
pushl $LC2
call _printf
addl $16,%esp
leave
ret
|
Calling scanf() example
|
8. C "calling conventions"
"The C library is nothing if not consistent, and that
is its greatest virtue"
- Jeff Duntemann, "Assembly Language
Step-by-step"
C library functions must be called using the following conventions:
- Preserve registers - When you call a function, it may change
register values. Therefore, if you wish to preserve the values in registers
(like ebx, esp, ebp, esi, and edi), then you must push them onto the stack
prior to the call and restore them after the call.
- Return value - The return value of a function call is placed in
register eax. If the return value is greater than 32 bits, up to 64 bits,
then the lower 32 bits are in eax and the high 32 bits are in edx.
- Parameter order - Parameters to a function are pushed onto the
stack in reverse order. For example, a call to f( x, y, z) is done by
first pushing z, then push y, and finally push x. Then you issue the call
opcode.
- After the call - Functions do not remove their parameters from the
stack, so it is the caller's duty to clean-up upon a function return. As we
have seen, the stack pointer (esp register) is fixed by adding the number of
bytes to erase to its current value.
I recommend that you use these conventions with your own functions as well.
9. Differences: gcc and our text
There are some very fundamental differences between the assembly code format
used in gcc and that used in our textbook.
- gcc is case-sensitive. In general, use lower-case (thankfully) rather than
the upper case in the text.
- The order of arguments in dyadic instructions (those with two operands) is
switched. In our text the order of operands to, for example, add is add
dest, src. The order in gcc is add
src, dest.
- In gcc, many instructions have a suffix defining the size of the operands.
For example, rather than a single add
instruction, gcc supports addl, addw,
and addb for long (4 bytes), word
(2 bytes) and byte (1 byte) quantities.
- Immediate values (constants) are prefixed with a $.
- Registers are prefixed with a %.
- Memory is referenced using parens.
- The format of pseudo-instructions controlling assembly are different. For
example gcc uses .globl rather than
PUBLIC to designate a public
symbol.
Here are some examples for these differences:
gcc style |
textbook style |
Comments |
addl $4, %esp |
ADD ESP, 4 |
gcc is lower case
gcc constant is $4
gcc register is %esp
gcc addl to add 4 bytes
gcc order of operands is switched |
.globl
_main |
PUBLIC
_main |
gcc uses different pseudo-instruction |
subl $12(%ebp),%eax |
SUB EAX
[EBP+12] |
gcc memory reference uses parens with index outside |
10. Instructions
Selected (and hopefully most important) instructions on a separate page:
instructions.htm
11. Examples
Also, on a separate page, some tasty assembly code examples:
examples.htm