Overview: Intel Assembly Using gcc
William T Krieger... Sep 2005
This page is a potpourri, supplemental.
It attempts to describe portions of the process of writing Intel assembly code using
gcc. This explanation is done in the context
of our current class, specifically using our textbook. Through examples and
brief explanations, I attempt to guide you through the (many) potholes of writing
Intel assembly language using gcc.
1. Using gcc
Here are some notes on getting
gcc,
installing it and whatever:
Using gcc for Intel Assembler
2. Hello world example
Let's look at an example, good old "Hello world" with each line
commented:
.text |
# begin text
segment |
greet:
.ascii "Hello world\0" |
# define const
label "greet" |
.globl _main |
# make _main
global/visible |
|
|
_main: |
# define
_main function start |
pushl $greet |
# push greet
onto esp stack |
call _printf |
# call
printf function |
addl $4,%esp |
# add 4 to
esp, rem from stack |
|
|
leave |
# prepare to
return |
ret |
# return |
Hello world example |
3. The basics
The syntactic style supported by
gcc is generally known as the "AT&T
style." It was developed in conjunction with Unix and pre-dates the Intel
style used in our textbook and by Microsoft in its assembly language called
Microsoft Assembly (MASM).
Comments start with a pound sign (#). C++ style comments (// and /* ...
*/) are also supported, but not after every statement type (like some of the
pseudo-instructions). I would just stick with #.
As in C/C++, the first function is called
main.
The underscore (_main) is
a convention adopted to avoid name conflicts and is used on many "system" function names.
3.1. Instruction format
Assembly programs contain assembly instructions and assembler directives
called pseudo-instructions. Pseudo-instructions generally start with a
dot (.) and have many functions. They do not translate directly into Intel
assembler instructions, rather they guide the assembler in constructing correct
machine code for your assembly code.
Assembly instructions have the format:
label: |
opcode |
operand-list |
comment |
Each of these fields is optional (blank lines are OK). Anything on a line
after the pound sign (#) is ignored. Here are some
examples:
greet:
.ascii "Hello world" # define ascii label |
addl $4,%esp
# add 4 to SP register |
3.2. Program segments
Programs have the following segments in memory with syntax in parens:
- Text - the main section containing your code and constant values (
.text
)
- Data - contains initialized global variables (
.data )
- Bss - contains un-initialized global variables (
.bss )
3.3. The stack
A section of memory is reserved for your use as a stack. The stack has two
primary uses:
- Storing and passing the values of function parameters
- Storing the value of the local variables of a function
There and important conventions you need to
know and follow regarding the stack as well:
- The address of the top of the stack is maintained in register
%esp.
- The stack starts at a high memory address. As items are added to the
stack, it grows downward. Because of this
%esp decreases as the
number of items on the stack grows and increases as the stack shrinks.
- The
pushl src instruction places the value in
src on the
stack and then decrements the stack pointer (%esp) by 4 bytes.
- The
popl dest instruction copies the top value on the stack to
dest
and then increments the stack pointer (%esp) by 4 bytes.
- The stack pointer (%esp) can be moved an arbitrary number of bytes
using the addl instruction.
Examples using the stack follow in the "Calling functions" and
"Local variables" sections.
4. Arithmetic
Coming soon...
5. Calling functions
Calling a function is a three step process:
- Place parameters to the function on the stack
- Call the function using the
call instruction
- Adjust the stack pointer (%esp) to effectively remove the
parameters from the stack
Here's an example of a call to a simple
function called plus10():
- .text
.globl _main
_main:
pushl $100
# first arg is 100
- call plus10
addl $4, %esp # remove arg
from stack
ret
# Function: int plus10( int p1)
# Adds 10 to the parameter and then returns the value
plus10:
movl 8(%esp),%eax # eax = first
parameter
addl $10,%eax
# eax = eax + 10
ret
|
Function call example
|
|
Please note the first line of
plus10. The first 4-byte argument
resides 8 bytes from the top of the stack. This is because the
call
instruction pushes the return address of the calling function on the stack
before ceding control of the program to plus10.
This return address is popped from the stack and used by the
ret
instruction.
The return value of a function is usually placed in the
%eax register.
For functions with multiple arguments, each argument is placed on the stack
in last-to-first order. For example, a call to foo( 7, 14) would
look like this:
- pushl $14
# push arg 14
pushl $7
# push arg 7
call foo
addl $8, %esp # remove
args from stack
|
6. Local variables
Local variables for a function (including
main)
are also allocated on the stack.
The use of local variables also
introduces something called the frame pointer; it is stored in
the %ebp
register. The frame pointer makes using the stack (in
%esp
register) for local variables easier and also maintains the integrity of
registers (namely the stack and the frame pointer) that may be used by
calling or called procedures. Here are the steps involved in using local
variables:
- Push the current frame pointer
value onto the stack... pushl %ebp
- Save (well, move) the stack
pointer value in the frame pointer... movl %esp, %ebp
- Then, create and use local
variables on the stack... see example below
- Restore the old value of the stack
pointer that we saved in the frame pointer... movl %ebp, %esp
- Pop the old frame pointer value
off the stack... popl %ebp
Yes, this will take a while to get sink
in, but it is a rote process that you will use for each procedure.
- .text
# _main() start
.globl _main # define _main externally
_main:
call foo
ret
foo:
# preserve old & set new frame pointer (ebp)
pushl %ebp
# save old ebp
movl %esp, %ebp # set new ebp
# create a local var on stack, set it to 10
subl $4, %esp # sub 4 from esp to create local var
movl $10, -4(%ebp) # set local var -4(%ebp)
addl $4, %esp # remove local var space
# return old frame pointer and return
movl %ebp, %esp
popl %ebp
ret
|
Local variables example |
|
7. Global variables
Global variables are (usually) placed in the data section. The name of the
global variable is specified using a label. The size of the variable is
indicated by the pseudo-opcode specified:
- .long
- an 4 byte integer
- .word
- a 2 byte integer
- .byte
- a 1 byte quantity, char or integer
- .space
- allocates a number of bytes, such as storage for an array
Here are some examples:
.data
x: .long 17 # long x = 17
y: .word 3 # short y = 3
ch: .byte 'W' # char ch = 'W'
z: .byte 1 # unsigned char z = 1
array: .space 20 # 20 bytes for array, for example 5 longs
|
These are also global variables with an initial value. Un-initialized global
variables should be defined in the bss section.
8. Input/output
We will output data to the screen using a C function called
printf().
The "Hello World" example in the beginning shows a call of
printf().
We will read values from the keyboard using C function
scanf(). It's format
is similar to printf(). Here's an example:
- # data section: global variables
.data
x: .long 0 # x = 0
y: .long 0 # y = 0
# text section: program constants
.text
LC0: .ascii "Enter 2 integer values:\0"
LC1: .ascii "%d%d\0"
LC2: .ascii "Two values are: %d, %d\12\0"
# _main() function
.globl _main
_main:
# printf( "Enter 2 integer values:");
pushl $LC0
call _printf
addl $16,%esp
# scanf( "%d%d", &x, &y);
pushl $y
pushl $x
pushl $LC1
call _scanf
addl $16,%esp
# printf( "Two values are: %d, %d", x, y)
movl y,%eax
pushl %eax
movl x,%eax
pushl %eax
pushl $LC2
call _printf
addl $16,%esp
leave
ret
|
Calling scanf() example
|
|
You can
query Microsoft Visual-C++ for a manual page on either
printf or
scanf.
Here's another nice source:
9. C "calling conventions"
"The C library is nothing if not consistent, and that
is its greatest virtue"
- Jeff Duntemann, "Assembly Language
Step-by-step"
C library functions must be called using the following conventions:
- Preserve registers - When you call a function, it may change
register values. Therefore, if you wish to preserve the values in registers
(like ebx,
esp,
ebp,
esi, and
edi), then you must push them onto the stack
prior to the call and restore them after the call.
- Return value - The return value of a function call is placed in
register eax. If the return value is greater than 32 bits, up to 64 bits,
then the lower 32 bits are in eax and the high 32 bits are in
edx.
- Parameter order - Parameters to a function are pushed onto the
stack in reverse order. For example, a call to
f( x, y, z) is done by
first pushing z, then push
y, and finally push
x. Then you issue the
call opcode.
- After the call - Functions do not remove their parameters from the
stack, so it is the caller's duty to clean-up upon a function return. As we
have seen, the stack pointer (esp register) is fixed by adding the number of
bytes to erase to its current value.
I recommend that you use these conventions with your own functions as well.
10. Differences: gcc and our text
There are some very fundamental differences between the assembly code format
used in gcc and that used in our textbook.
- gcc is case-sensitive. In general, use lower-case (thankfully) rather than
the upper case in the text.
- The order of arguments in dyadic
instructions (those with two operands) is switched. In our text the order of
operands to, for example, add
is add
dest, src. The order in
gcc is
add
src, dest.
- In gcc, many instructions have a suffix defining the size of the operands.
For example, rather than a single add
instruction, gcc supports addl,
addw,
and addb for long (4 bytes), word
(2 bytes) and byte (1 byte) quantities.
- Immediate values (constants) are prefixed
with a $.
- Registers are prefixed with a
%.
- Memory is referenced using parens.
- The format of pseudo-instructions controlling assembly are different. For
example gcc uses
.globl rather than
PUBLIC to designate a public
symbol.
Here are some examples for these differences:
gcc style |
textbook style |
Notes |
addl $4, %esp |
ADD ESP, 4 |
gcc is lower case
gcc constant is $4
gcc register is %esp
gcc addl to add 4 bytes
gcc order of operands is switched |
.globl
_main |
PUBLIC
_main |
gcc uses different pseudo-instruction |
subl $12(%ebp),%eax |
SUB EAX
[EBP+12] |
gcc memory reference uses parens with index outside |
11. Instructions
Selected (and hopefully most important) Intel
assembly instructions on a separate page:
instructions.htm
12. Examples
Also, on a separate page, some tasty Intel assembly code examples:
examples.htm
13. More help
Prof. Muganda has a tutorial guide on Intel
Assembly similar to this one, that you may enjoy even more:
muganda_GasManual.pdf
|