3.1. Layout of a NASM Source Line
Как в других ассемблерах,каждая строка NASM-исходника состоит из 4 основных частей :
label: instruction operands ; comment
As usual, most of these fields are optional; the presence or absence
of any combination of a label, an instruction and a comment is allowed.
Of course, the operand field is either required or forbidden by the
presence and nature of the instruction field.
NASM uses backslash (\) as the line continuation character; if a
line ends with backslash, the next line is considered to be a part of
the backslash- ended line.
NASM places no restrictions on white space within a line: labels may
have white space before them, or instructions may have no space before
them, or anything. The colon after a label is also optional. (Note that
this means that if you intend to code `lodsb' alone on a line, and type
`lodab' by accident, then that's still a valid source line which does
nothing but define a label. Running NASM with the command-line option
`-w+orphan-labels' will cause it to warn you if you define a label
alone on a line without a trailing colon.)
Valid characters in labels are letters, numbers, `_', `$', `#', `@',
`~', `.', and `?'. The only characters which may be used as the _first_
character of an identifier are letters, `.' (with special meaning: see
*Note Section 3.9::), `_' and `?'. An identifier may also be prefixed
with a `$' to indicate that it is intended to be read as an identifier
and not a reserved word; thus, if some other module you are linking
with defines a symbol called `eax', you can refer to `$eax' in NASM
code to distinguish the symbol from the register.
The instruction field may contain any machine instruction: Pentium
and P6 instructions, FPU instructions, MMX instructions and even
undocumented instructions are all supported. The instruction may be
prefixed by `LOCK', `REP', `REPE'/`REPZ' or `REPNE'/`REPNZ', in the
usual way. Explicit address-size and operand-size prefixes `A16',
`A32', `O16' and `O32' are provided - one example of their use is given
in *Note Chapter 9::. You can also use the name of a segment register
as an instruction prefix: coding `es mov [bx],ax' is equivalent to
coding `mov [es:bx],ax'. We recommend the latter syntax, since it is
consistent with other syntactic features of the language, but for
instructions such as `LODSB', which has no operands and yet can require
a segment override, there is no clean syntactic way to proceed apart
from `es lodsb'.
An instruction is not required to use a prefix: prefixes such as
`CS', `A32', `LOCK' or `REPE' can appear on a line by themselves, and
NASM will just generate the prefix bytes.
In addition to actual machine instructions, NASM also supports a
number of pseudo-instructions, described in *Note Section 3.2::.
Instruction operands may take a number of forms: they can be
registers, described simply by the register name (e.g. `ax', `bp',
`ebx', `cr0': NASM does not use the `gas'-style syntax in which
register names must be prefixed by a `%' sign), or they can be
effective addresses (see *Note Section 3.3::), constants (*Note Section
3.4::) or expressions (*Note Section 3.5::).
For floating-point instructions, NASM accepts a wide range of
syntaxes: you can use two-operand forms like MASM supports, or you can
use NASM's native single-operand forms in most cases. Details of all
forms of each supported instruction are given in *Note Appendix B::.
For example, you can code:
fadd st1 ; this sets st0 := st0 + st1
fadd st0,st1 ; so does this
fadd st1,st0 ; this sets st1 := st1 + st0
fadd to st1 ; so does this
Almost any floating-point instruction that references memory must
use one of the prefixes `DWORD', `QWORD' or `TWORD' to indicate what
size of memory operand it refers to.
Pseudo-instructions are things which, though not real x86 machine
instructions, are used in the instruction field anyway because that's
the most convenient place to put them. The current pseudo-instructions
are `DB', `DW', `DD', `DQ' and `DT', their uninitialised counterparts
`RESB', `RESW', `RESD', `RESQ' and `REST', the `INCBIN' command, the
`EQU' command, and the `TIMES' prefix.
3.2.1. `DB' and friends: Declaring Initialised Data
`DB', `DW', `DD', `DQ' and `DT' are used, much as in MASM, to
declare initialised data in the output file. They can be invoked in a
wide range of ways:
db 0x55 ; just the byte 0x55
db 0x55,0x56,0x57 ; three bytes in succession
db 'a',0x55 ; character constants are OK
db 'hello',13,10,'$' ; so are string constants
dw 0x1234 ; 0x34 0x12
dw 'a' ; 0x61 0x00 (it's just a number)
dw 'ab' ; 0x61 0x62 (character constant)
dw 'abc' ; 0x61 0x62 0x63 0x00 (string)
dd 0x12345678 ; 0x78 0x56 0x34 0x12
dd 1.234567e20 ; floating-point constant
dq 1.234567e20 ; double-precision float
dt 1.234567e20 ; extended-precision float
`DQ' and `DT' do not accept numeric constants or string constants as
3.2.2. `RESB' and friends: Declaring Uninitialised Data
`RESB', `RESW', `RESD', `RESQ' and `REST' are designed to be used in
the BSS section of a module: they declare _uninitialised_ storage
space. Each takes a single operand, which is the number of bytes,
words, doublewords or whatever to reserve. As stated in *Note Section
2.2.7::, NASM does not support the MASM/TASM syntax of reserving
uninitialised space by writing `DW ?' or similar things: this is what
it does instead. The operand to a `RESB'-type pseudo- instruction is a
_critical expression_: see *Note Section 3.8::.
buffer: resb 64 ; reserve 64 bytes
wordvar: resw 1 ; reserve a word
realarray resq 10 ; array of ten reals
3.2.3. `INCBIN': Including External Binary Files
`INCBIN' is borrowed from the old Amiga assembler DevPac: it includes
a binary file verbatim into the output file. This can be handy for (for
example) including graphics and sound data directly into a game
executable file. It can be called in one of these three ways:
incbin "file.dat" ; include the whole file
incbin "file.dat",1024 ; skip the first 1024 bytes
incbin "file.dat",1024,512 ; skip the first 1024, and
; actually include at most 512
3.2.4. `EQU': Defining Constants
`EQU' defines a symbol to a given constant value: when `EQU' is
used, the source line must contain a label. The action of `EQU' is to
define the given label name to the value of its (only) operand. This
definition is absolute, and cannot change later. So, for example,
message db 'hello, world'
msglen equ $-message
defines `msglen' to be the constant 12. `msglen' may not then be
redefined later. This is not a preprocessor definition either: the
value of `msglen' is evaluated _once_, using the value of `$' (see
*Note Section 3.5:: for an explanation of `$') at the point of
definition, rather than being evaluated wherever it is referenced and
using the value of `$' at the point of reference. Note that the operand
to an `EQU' is also a critical expression (*Note Section 3.8::).
3.2.5. `TIMES': Repeating Instructions or Data
The `TIMES' prefix causes the instruction to be assembled multiple
times. This is partly present as NASM's equivalent of the `DUP' syntax
supported by MASM-compatible assemblers, in that you can code
zerobuf: times 64 db 0
or similar things; but `TIMES' is more versatile than that. The
argument to `TIMES' is not just a numeric constant, but a numeric
_expression_, so you can do things like
buffer: db 'hello, world'
times 64-$+buffer db ' '
which will store exactly enough spaces to make the total length of
`buffer' up to 64. Finally, `TIMES' can be applied to ordinary
instructions, so you can code trivial unrolled loops in it:
times 100 movsb
Note that there is no effective difference between `times 100 resb 1'
and `resb 100', except that the latter will be assembled about 100
times faster due to the internal structure of the assembler.
The operand to `TIMES', like that of `EQU' and those of `RESB' and
friends, is a critical expression (*Note Section 3.8::).
3.3. Effective Addresses
An effective address is any operand to an instruction which
references memory. Effective addresses, in NASM, have a very simple
syntax: they consist of an expression evaluating to the desired
address, enclosed in square brackets. For example:
wordvar dw 123
Anything not conforming to this simple system is not a valid memory
reference in NASM, for example `es:wordvar[bx]'.
More complicated effective addresses, such as those involving more
than one register, work in exactly the same way:
NASM is capable of doing algebra on these effective addresses, so
that things which don't necessarily _look_ legal are perfectly all
mov eax,[ebx*5] ; assembles as [ebx*4+ebx]
mov eax,[label1*2-label2] ; ie [label1+(label1-label2)]
Some forms of effective address have more than one assembled form;
in most such cases NASM will generate the smallest form it can. For
example, there are distinct assembled forms for the 32-bit effective
addresses `[eax*2+0]' and `[eax+eax]', and NASM will generally generate
the latter on the grounds that the former requires four bytes to store
a zero offset.
NASM has a hinting mechanism which will cause `[eax+ebx]' and
`[ebx+eax]' to generate different opcodes; this is occasionally useful
because `[esi+ebp]' and `[ebp+esi]' have different default segment
However, you can force NASM to generate an effective address in a
particular form by the use of the keywords `BYTE', `WORD', `DWORD' and
`NOSPLIT'. If you need `[eax+3]' to be assembled using a double-word
offset field instead of the one byte NASM will normally generate, you
can code `[dword eax+3]'. Similarly, you can force NASM to use a byte
offset for a small value which it hasn't seen on the first pass (see
*Note Section 3.8:: for an example of such a code fragment) by using
`[byte eax+offset]'. As special cases, `[byte eax]' will code `[eax+0]'
with a byte offset of zero, and `[dword eax]' will code it with a
double-word offset of zero. The normal form, `[eax]', will be coded
with no offset field.
The form described in the previous paragraph is also useful if you
are trying to access data in a 32-bit segment from within 16 bit code.
For more information on this see the section on mixed-size addressing
(*Note Section 9.2::). In particular, if you need to access data with a
known offset that is larger than will fit in a 16-bit value, if you
don't specify that it is a dword offset, nasm will cause the high word
of the offset to be lost.
Similarly, NASM will split `[eax*2]' into `[eax+eax]' because that
allows the offset field to be absent and space to be saved; in fact, it
will also split `[eax*2+offset]' into `[eax+eax+offset]'. You can
combat this behaviour by the use of the `NOSPLIT' keyword: `[nosplit
eax*2]' will force `[eax*2+0]' to be generated literally.
NASM understands four different types of constant: numeric,
character, string and floating-point.
3.4.1. Numeric Constants
A numeric constant is simply a number. NASM allows you to specify
numbers in a variety of number bases, in a variety of ways: you can
suffix `H', `Q' or `O', and `B' for hex, octal and binary, or you can
prefix `0x' for hex in the style of C, or you can prefix `$' for hex in
the style of Borland Pascal. Note, though, that the `$' prefix does
double duty as a prefix on identifiers (see *Note Section 3.1::), so a
hex number prefixed with a `$' sign must have a digit after the `$'
rather than a letter.
mov ax,100 ; decimal
mov ax,0a2h ; hex
mov ax,$0a2 ; hex again: the 0 is required
mov ax,0xa2 ; hex yet again
mov ax,777q ; octal
mov ax,777o ; octal again
mov ax,10010011b ; binary
3.4.2. Character Constants
A character constant consists of up to four characters enclosed in
either single or double quotes. The type of quote makes no difference
to NASM, except of course that surrounding the constant with single
quotes allows double quotes to appear within it and vice versa.
A character constant with more than one character will be arranged
with little-endian order in mind: if you code
then the constant generated is not `0x61626364', but `0x64636261',
so that if you were then to store the value into memory, it would read
`abcd' rather than `dcba'. This is also the sense of character
constants understood by the Pentium's `CPUID' instruction (see *Note
3.4.3. String Constants
String constants are only acceptable to some pseudo-instructions,
namely the `DB' family and `INCBIN'.
A string constant looks like a character constant, only longer. It is
treated as a concatenation of maximum-size character constants for the
conditions. So the following are equivalent:
db 'hello' ; string constant
db 'h','e','l','l','o' ; equivalent character constants
And the following are also equivalent:
dd 'ninechars' ; doubleword string constant
dd 'nine','char','s' ; becomes three doublewords
db 'ninechars',0,0,0 ; and really looks like this
Note that when used as an operand to `db', a constant like `'ab'' is
treated as a string constant despite being short enough to be a
character constant, because otherwise `db 'ab'' would have the same
effect as `db 'a'', which would be silly. Similarly, three-character or
four-character constants are treated as strings when they are operands
3.4.4. Floating-Point Constants
Floating-point constants are acceptable only as arguments to `DD',
`DQ' and `DT'. They are expressed in the traditional form: digits, then
a period, then optionally more digits, then optionally an `E' followed
by an exponent. The period is mandatory, so that NASM can distinguish
between `dd 1', which declares an integer constant, and `dd 1.0' which
declares a floating-point constant.
dd 1.2 ; an easy one
dq 1.e10 ; 10,000,000,000
dq 1.e+10 ; synonymous with 1.e10
dq 1.e-10 ; 0.000 000 000 1
dt 3.141592653589793238462 ; pi
NASM cannot do compile-time arithmetic on floating-point constants.
This is because NASM is designed to be portable - although it always
generates code to run on x86 processors, the assembler itself can run
on any system with an ANSI C compiler. Therefore, the assembler cannot
guarantee the presence of a floating-point unit capable of handling the
Intel number formats, and so for NASM to be able to do floating
arithmetic it would have to include its own complete set of
floating-point routines, which would significantly increase the size of
the assembler for very little benefit.
Expressions in NASM are similar in syntax to those in C.
NASM does not guarantee the size of the integers used to evaluate
expressions at compile time: since NASM can compile and run on 64-bit
systems quite happily, don't assume that expressions are evaluated in
32- bit registers and so try to make deliberate use of integer
overflow. It might not always work. The only thing NASM will guarantee
is what's guaranteed by ANSI C: you always have _at least_ 32 bits to
NASM supports two special tokens in expressions, allowing
calculations to involve the current assembly position: the `$' and `$$'
tokens. `$' evaluates to the assembly position at the beginning of the
line containing the expression; so you can code an infinite loop using
`JMP $'. `$$' evaluates to the beginning of the current section; so you
can tell how far into the section you are by using `($-$$)'.
The arithmetic operators provided by NASM are listed here, in
increasing order of precedence.
3.5.1. `|': Bitwise OR Operator
The `|' operator gives a bitwise OR, exactly as performed by the
`OR' machine instruction. Bitwise OR is the lowest-priority arithmetic
operator supported by NASM.
3.5.2. `^': Bitwise XOR Operator
`^' provides the bitwise XOR operation.
3.5.3. `&': Bitwise AND Operator
`&' provides the bitwise AND operation.
3.5.4. `<<' and `>>': Bit Shift Operators
`<<' gives a bit-shift to the left, just as it does in C. So `5<<3'
evaluates to 5 times 8, or 40. `>>' gives a bit-shift to the right; in
NASM, such a shift is _always_ unsigned, so that the bits shifted in
from the left-hand end are filled with zero rather than a
sign-extension of the previous highest bit.
3.5.6. `*', `/', `//', `%' and `%%': Multiplication and Division
`*' is the multiplication operator. `/' and `//' are both division
operators: `/' is unsigned division and `//' is signed division.
Similarly, `%' and `%%' provide unsigned and signed modulo operators
NASM, like ANSI C, provides no guarantees about the sensible
operation of the signed modulo operator.
Since the `%' character is used extensively by the macro
preprocessor, you should ensure that both the signed and unsigned
modulo operators are followed by white space wherever they appear.
3.6. `SEG' and `WRT'
When writing large 16-bit programs, which must be split into multiple
segments, it is often necessary to be able to refer to the segment part
of the address of a symbol. NASM supports the `SEG' operator to perform
The `SEG' operator returns the _preferred_ segment base of a symbol,
defined as the segment base relative to which the offset of the symbol
makes sense. So the code
mov ax,seg symbol
will load `ES:BX' with a valid pointer to the symbol `symbol'.
Things can be more complex than this: since 16-bit segments and
groups may overlap, you might occasionally want to refer to some symbol
using a different segment base from the preferred one. NASM lets you do
this, by the use of the `WRT' (With Reference To) keyword. So you can
do things like
mov ax,weird_seg ; weird_seg is a segment base
mov bx,symbol wrt weird_seg
to load `ES:BX' with a different, but functionally equivalent,
pointer to the symbol `symbol'.
NASM supports far (inter-segment) calls and jumps by means of the
syntax `call segment:offset', where `segment' and `offset' both
represent immediate values. So to call a far procedure, you could code
call (seg procedure):procedure
call weird_seg:(procedure wrt weird_seg)
(The parentheses are included for clarity, to show the intended
parsing of the above instructions. They are not necessary in practice.)
NASM supports the syntax `call far procedure' as a synonym for the
first of the above usages. `JMP' works identically to `CALL' in these
To declare a far pointer to a data item in a data segment, you must
dw symbol, seg symbol
NASM supports no convenient synonym for this, though you can always
invent one using the macro processor.
3.7. `STRICT': Inhibiting Optimization
When assembling with the optimizer set to level 2 or higher (see
*Note Section 2.1.16::), NASM will use size specifiers (`BYTE', `WORD',
`DWORD', `QWORD', or `TWORD'), but will give them the smallest possible
size. The keyword `STRICT' can be used to inhibit optimization and
force a particular operand to be emitted in the specified size. For
example, with the optimizer on, and in `BITS 16' mode,
push dword 33
is encoded in three bytes `66 6A 21', whereas
push strict dword 33
is encoded in six bytes, with a full dword immediate operand `66 68
21 00 00 00'.
With the optimizer off, the same code (six bytes) is generated
whether the `STRICT' keyword was used or not.
3.8. Critical Expressions
A limitation of NASM is that it is a two-pass assembler; unlike TASM
and others, it will always do exactly two assembly passes. Therefore it
is unable to cope with source files that are complex enough to require
three or more passes.
The first pass is used to determine the size of all the assembled
code and data, so that the second pass, when generating all the code,
knows all the symbol addresses the code refers to. So one thing NASM
can't handle is code whose size depends on the value of a symbol
declared after the code in question. For example,
times (label-$) db 0
label: db 'Where am I?'
The argument to `TIMES' in this case could equally legally evaluate
to anything at all; NASM will reject this example because it cannot
tell the size of the `TIMES' line when it first sees it. It will just
as firmly reject the slightly paradoxical code
times (label-$+1) db 0
label: db 'NOW where am I?'
in which _any_ value for the `TIMES' argument is by definition wrong!
NASM rejects these examples by means of a concept called a _critical
expression_, which is defined to be an expression whose value is
required to be computable in the first pass, and which must therefore
depend only on symbols defined before it. The argument to the `TIMES'
prefix is a critical expression; for the same reason, the arguments to
the `RESB' family of pseudo-instructions are also critical expressions.
Critical expressions can crop up in other contexts as well: consider
the following code.
symbol1 equ symbol2
On the first pass, NASM cannot determine the value of `symbol1',
because `symbol1' is defined to be equal to `symbol2' which NASM hasn't
seen yet. On the second pass, therefore, when it encounters the line
`mov ax,symbol1', it is unable to generate the code for it because it
still doesn't know the value of `symbol1'. On the next line, it would
see the `EQU' again and be able to determine the value of `symbol1',
but by then it would be too late.
NASM avoids this problem by defining the right-hand side of an `EQU'
statement to be a critical expression, so the definition of `symbol1'
would be rejected in the first pass.
There is a related issue involving forward references: consider this
offset equ 10
NASM, on pass one, must calculate the size of the instruction `mov
eax,[ebx+offset]' without knowing the value of `offset'. It has no way
of knowing that `offset' is small enough to fit into a one- byte offset
field and that it could therefore get away with generating a shorter
form of the effective-address encoding; for all it knows, in pass one,
`offset' could be a symbol in the code segment, and it might need the
full four-byte form. So it is forced to compute the size of the
instruction to accommodate a four-byte address part. In pass two, having
made this decision, it is now forced to honour it and keep the
instruction large, so the code generated in this case is not as small
as it could have been. This problem can be solved by defining `offset'
before using it, or by forcing byte size in the effective address by
coding `[byte ebx+offset]'.
3.9. Local Labels
NASM gives special treatment to symbols beginning with a period. A
label beginning with a single period is treated as a _local_ label,
which means that it is associated with the previous non-local label.
So, for example:
label1 ; some code
; some more code
label2 ; some code
; some more code
In the above code fragment, each `JNE' instruction jumps to the line
immediately before it, because the two definitions of `.loop' are kept
separate by virtue of each being associated with the previous non-local
This form of local label handling is borrowed from the old Amiga
assembler DevPac; however, NASM goes one step further, in allowing
access to local labels from other parts of the code. This is achieved
by means of _defining_ a local label in terms of the previous non-local
label: the first definition of `.loop' above is really defining a
symbol called `label1.loop', and the second defines a symbol called
`label2.loop'. So, if you really needed to, you could write
label3 ; some more code
; and some more
Sometimes it is useful - in a macro, for instance - to be able to
define a label which can be referenced from anywhere but which doesn't
interfere with the normal local-label mechanism. Such a label can't be
non-local because it would interfere with subsequent definitions of,
and references to, local labels; and it can't be local because the
macro that defined it wouldn't know the label's full name. NASM
therefore introduces a third type of label, which is probably only
useful in macro definitions: if a label begins with the special prefix
`..@', then it does nothing to the local label mechanism. So you could
label1: ; a non-local label
.local: ; this is really label1.local
..@foo: ; this is a special symbol
label2: ; another non-local label
.local: ; this is really label2.local
jmp ..@foo ; this will jump three lines up
Chapter 4: The NASM Preprocessor
NASM contains a powerful macro processor, which supports conditional
assembly, multi-level file inclusion, two forms of macro (single-line
and multi-line), and a `context stack' mechanism for extra macro power.
Preprocessor directives all begin with a `%' sign.
The preprocessor collapses all lines which end with a backslash (\)
character into a single line. Thus:
%define THIS_VERY_LONG_MACRO_NAME_IS_DEFINED_TO \
will work like a single-line macro without the backslash-newline
4.1.1. The Normal Way: `%define'
Single-line macros are defined using the `%define' preprocessor
directive. The definitions work in a similar way to C; so you can do
%define ctrl 0x1F &
%define param(a,b) ((a)+(a)*(b))
mov byte [param(2,ebx)], ctrl 'D'
which will expand to
mov byte [(2)+(2)*(ebx)], 0x1F & 'D'
When the expansion of a single-line macro contains tokens which
invoke another macro, the expansion is performed at invocation time,
not at definition time. Thus the code
%define a(x) 1+b(x)
%define b(x) 2*x
will evaluate in the expected way to `mov ax,1+2*8', even though the
macro `b' wasn't defined at the time of definition of `a'.
Macros defined with `%define' are case sensitive: after `%define foo
bar', only `foo' will expand to `bar': `Foo' or `FOO' will not. By
using `%idefin!"e' instead of `%define' (the `i' stands for
`insensitive') you can define all the case variants of a macro at once,
so that `%idefine foo bar' would cause `foo', `Foo', `FOO', `fOO' and
so on all to expand to `bar'.
There is a mechanism which detects when a macro call has occurred as
a result of a previous expansion of the same macro, to guard against
circular references and infinite loops. If this happens, the
preprocessor will only expand the first occurrence of the macro. Hence,
if you code
%define a(x) 1+a(x)
the macro `a(3)' will expand once, becoming `1+a(3)', and will then
expand no further. This behaviour can be useful: see *Note Section 8.1::
for an example of its use.
You can overload single-line macros: if you write
%define foo(x) 1+x
%define foo(x,y) 1+x*y
the preprocessor will be able to handle both types of macro call, by
counting the parameters you pass; so `foo(3)' will become `1+3' whereas
`foo(ebx,2)' will become `1+ebx*2'. However, if you define
%define foo bar
then no other definition of `foo' will be accepted: a macro with no
parameters prohibits the definition of the same name as a macro _with_
parameters, and vice versa.
This doesn't prevent single-line macros being _redefined_: you can
perfectly well define a macro with
%define foo bar
and then re-define it later in the same source file with
%define foo baz
Then everywhere the macro `foo' is invoked, it will be expanded
according to the most recent definition. This is particularly useful
when defining single-line macros with `%assign' (see *Note Section
4.4. Conditional Assembly
Similarly to the C preprocessor, NASM allows sections of a source
file to be assembled only if certain conditions are met. The general
syntax of this feature looks like this:
; some code which only appears if is met
; only appears if is not met but is
; this appears if neither nor was met
The `%else' clause is optional, as is the `%elif' clause. You can
have more than one `%elif' clause as well.
4.4.1. `%ifdef': Testing Single-Line Macro Existence
Beginning a conditional-assembly block with the line `%ifdef MACRO'
will assemble the subsequent code if, and only if, a single-line macro
called `MACRO' is defined. If not, then the `%elif' and `%else' blocks
(if any) will be processed instead.
For example, when debugging a program, you might want to write code
; perform some function
writefile 2,"Function performed successfully",13,10
; go and do something else
Then you could use the command-line option `-dDEBUG' to create a
version of the program which produced debugging messages, and remove the
option to generate the final release version of the program.
You can test for a macro _not_ being defined by using `%ifndef'
instead of `%ifdef'. You can also test for macro definitions in `%elif'
blocks by using `%elifdef' and `%elifndef'.
4.4.2. `ifmacro': Testing Multi-Line Macro Existence
The `%ifmacro' directive operates in the same way as the `%ifdef'
directive, except that it checks for the existence of a multi-line
For example, you may be working with a large project and not have
control over the macros in a library. You may want to create a macro
with one name if it doesn't already exist, and another name if one with
that name does exist.
The `%ifmacro' is considered true if defining a macro with the given
name and number of arguments would cause a definitions conflict. For
%ifmacro MyMacro 1-3
%error "MyMacro 1-3" causes a conflict with an existing macro.
%macro MyMacro 1-3
; insert code to define the macro
4.4.4. `%if': Testing Arbitrary Numeric Expressions
The conditional-assembly construct `%if expr' will cause the
subsequent code to be assembled if and only if the value of the numeric
expression `expr' is non-zero. An example of the use of this feature is
in deciding when to break out of a `%rep' preprocessor loop: see *Note
Section 4.5:: for a detailed example.
The expression given to `%if', and its counterpart `%elif', is a
critical expression (see *Note Section 3.8::).
`%if' extends the normal NASM expression syntax, by providing a set
of relational operators which are not normally available in
expressions. The operators `=', `<', `>', `<=', `>=' and `<>' test
equality, less-than, greater-than, less-or-equal, greater-or-equal and
not-equal respectively. The C-like forms `==' and `!=' are supported as
alternative forms of `=' and `<>'. In addition, low- priority logical
operators `&&', `^^' and `||' are provided, supplying logical AND,
logical XOR and logical OR. These work like the C logical operators
(although C has no logical XOR), in that they always return either 0 or
1, and treat any non-zero input as 1 (so that `^^', for example,
returns 1 if exactly one of its inputs is zero, and 0 otherwise). The
relational operators also return 1 for true and 0 for false.
4.5. Preprocessor Loops: `%rep'
NASM's `TIMES' prefix, though useful, cannot be used to invoke a
multi-line macro multiple times, because it is processed by NASM after
macros have already been expanded. Therefore NASM provides another form
of loop, this time at the preprocessor level: `%rep'.
The directives `%rep' and `%endrep' (`%rep' takes a numeric
argument, which can be an expression; `%endrep' takes no arguments) can
be used to enclose a chunk of code, which is then replicated as many
times as specified by the preprocessor:
%assign i 0
inc word [table+2*i]
%assign i i+1
This will generate a sequence of 64 `INC' instructions, incrementing
every word of memory from `[table]' to `[table+126]'.
For more complex termination conditions, or to break out of a repeat
loop part way along, you can use the `%exitrep' directive to terminate
the loop, like this:
%assign i 0
%assign j 1
%if j > 65535
%assign k j+i
%assign i j
%assign j k
fib_number equ ($-fibonacci)/2
This produces a list of all the Fibonacci numbers that will fit in
16 bits. Note that a maximum repeat count must still be given to
`%rep'. This is to prevent the possibility of NASM getting into an
infinite loop in the preprocessor, which (on multitasking or multi-user
systems) would typically cause all the system memory to be gradually
used up and other applications to start crashing.
.6. Including Other Files
Using, once again, a very similar syntax to the C preprocessor,
NASM's preprocessor lets you include other source files into your code.
This is done by the use of the `%include' directive:
will include the contents of the file `macros.mac' into the source
file containing the `%include' directive.
Include files are searched for in the current directory (the
directory you're in when you run NASM, as opposed to the location of
the NASM executable or the location of the source file), plus any
directories specified on the NASM command line using the `-i' option.
The standard C idiom for preventing a file being included more than
once is just as applicable in NASM: if the file `macros.mac' has the
; now define some macros
then including the file more than once will not cause errors,
because the second time the file is included nothing will happen
because the macro `MACROS_MAC' will already be defined.
4.7. The Context Stack
Having labels that are local to a macro definition is sometimes not
quite powerful enough: sometimes you want to be able to share labels
between several macro calls. An example might be a `REPEAT' ... `UNTIL'
loop, in which the expansion of the `REPEAT' macro would need to be
able to refer to a label which the `UNTIL' macro had defined. However,
for such a macro you would also want to be able to nest these loops.
NASM provides this level of power by means of a _context stack_. The
preprocessor maintains a stack of _contexts_, each of which is
characterised by a name. You add a new context to the stack using the
`%push' directive, and remove one using `%pop'. You can define labels
that are local to a particular context on the stack.
4.7.1. `%push' and `%pop': Creating and Removing Contexts
The `%push' directive is used to create a new context and place it on
the top of the context stack. `%push' requires one argument, which is
the name of the context. For example:
This pushes a new context called `foobar' on the stack. You can have
several contexts on the stack with the same name: they can still be
The directive `%pop', requiring no arguments, removes the top context
from the context stack and destroys it, along with any labels associated
4.7.2. Context-Local Labels
Just as the usage `%%foo' defines a label which is local to the
particular macro call in which it is used, the usage `%$foo' is used to
define a label which is local to the context on the top of the context
stack. So the `REPEAT' and `UNTIL' example given above could be
implemented by means of:
%macro repeat 0
%macro until 1
and invoked by means of, for example,
which would scan every fourth byte of a string in search of the byte
If you need to define, or access, labels local to the context _below_
the top one on the stack, you can use `%$$foo', or `%$$$foo' for the
context below that, and so on.
4.8. Standard Macros
NASM defines a set of standard macros, which are already defined
when it starts to process any source file. If you really need a program
to be assembled with no pre-defined macros, you can use the `%clear'
directive to empty the preprocessor of everything.
Most user-level assembler directives (see *Note Chapter 5::) are
implemented as macros which invoke primitive directives; these are
described in *Note Chapter 5::. The rest of the standard macro set is
4.8.1. `__NASM_MAJOR__', `__NASM_MINOR__', `__NASM_SUBMINOR__'
The single-line macros `__NASM_MAJOR__', `__NASM_MINOR__',
`__NASM_SUBMINOR__' and `___NASM_PATCHLEVEL__' expand to the major,
minor, subminor and patch level parts of the version number of NASM
being used. So, under NASM 0.98.32p1 for example, `__NASM_MAJOR__'
would be defined to be 0, `__NASM_MINOR__' would be defined as 98,
`__NASM_SUBMINOR__' would be defined to 32, and `___NASM_PATCHLEVEL__'
would be defined as 1.
4.8.7. `ALIGN' and `ALIGNB': Data Alignment
The `ALIGN' and `ALIGNB' macros provides a convenient way to align
code or data on a word, longword, paragraph or other boundary. (Some
assemblers call this directive `EVEN'.) The syntax of the `ALIGN' and
`ALIGNB' macros is
align 4 ; align on 4-byte boundary
align 16 ; align on 16-byte boundary
align 8,db 0 ; pad with 0s rather than NOPs
align 4,resb 1 ; align to 4 in the BSS
alignb 4 ; equivalent to previous line
Both macros require their first argument to be a power of two; they
both compute the number of additional bytes required to bring the
length of the current section up to a multiple of that power of two,
and then apply the `TIMES' prefix to their second argument to perform
If the second argument is not specified, the default for `ALIGN' is
`NOP', and the default for `ALIGNB' is `RESB 1'. So if the second
argument is specified, the two macros are equivalent. Normally, you can
just use `ALIGN' in code and data sections and `ALIGNB' in BSS
sections, and never need the second argument except for special
`ALIGN' and `ALIGNB', being simple macros, perform no error
checking: they cannot warn you if their first argument fails to be a
power of two, or if their second argument generates more than one byte
of code. In each of these cases they will silently do the wrong thing.
`ALIGNB' (or `ALIGN' with a second argument of `RESB 1') can be used
within structure definitions:
This will ensure that the structure members are sensibly aligned
relative to the base of the structure.
A final caveat: `ALIGN' and `ALIGNB' work relative to the beginning
of the _section_, not the beginning of the address space in the final
executable. Aligning to a 16-byte boundary when the section you're in
is only guaranteed to be aligned to a 4-byte boundary, for example, is
a waste of effort. Again, NASM does not check that the section's
alignment characteristics are sensible for the use of `ALIGN' or
4.10. Other Preprocessor Directives
NASM also has preprocessor directives which allow access to
information from external sources. Currently they include:
The following preprocessor directive is supported to allow NASM to
correctly handle output of the cpp C language preprocessor.
* `%line' enables NAsM to correctly handle the output of the cpp C
language preprocessor (see *Note Section 4.10.1::).
* `%!' enables NASM to read in the value of an environment variable,
which can then be used in your program (see *Note Section
Chapter 5: Assembler Directives
NASM, though it attempts to avoid the bureaucracy of assemblers like
MASM and TASM, is nevertheless forced to support a _few_ directives.
These are described in this chapter.
NASM's directives come in two types: _user-level_ directives and
_primitive_ directives. Typically, each directive has a user-level form
and a primitive form. In almost all cases, we recommend that users use
the user-level forms of the directives, which are implemented as macros
which call the primitive forms.
Primitive directives are enclosed in square brackets; user-level
directives are not.
In addition to the universal directives described in this chapter,
each object file format can optionally supply extra directives in order
to control particular features of that file format. These
_format-specific_ directives are documented along with the formats that
implement them, in *Note Chapter 6::.
5.1. `BITS': Specifying Target Processor Mode
The `BITS' directive specifies whether NASM should generate code
designed to run on a processor operating in 16-bit mode, or code
designed to run on a processor operating in 32-bit mode. The syntax is
`BITS 16' or `BITS 32'.
In most cases, you should not need to use `BITS' explicitly. The
`aout', `coff', `elf' and `win32' object formats, which are designed
for use in 32-bit operating systems, all cause NASM to select 32-bit
mode by default. The `obj' object format allows you to specify each
segment you define as either `USE16' or `USE32', and NASM will set its
operating mode accordingly, so the use of the `BITS' directive is once
The most likely reason for using the `BITS' directive is to write 32-
bit code in a flat binary file; this is because the `bin' output format
defaults to 16-bit mode in anticipation of it being used most
frequently to write DOS `.COM' programs, DOS `.SYS' device drivers and
boot loader software.
You do _not_ need to specify `BITS 32' merely in order to use 32-
bit instructions in a 16-bit DOS program; if you do, the assembler will
generate incorrect code because it will be writing code targeted at a
32- bit platform, to be run on a 16-bit one.
When NASM is in `BITS 16' state, instructions which use 32-bit data
are prefixed with an 0x66 byte, and those referring to 32-bit addresses
have an 0x67 prefix. In `BITS 32' state, the reverse is true: 32-bit
instructions require no prefixes, whereas instructions using 16-bit data
need an 0x66 and those working on 16-bit addresses need an 0x67.
The `BITS' directive has an exactly equivalent primitive form,
`[BITS 16]' and `[BITS 32]'. The user-level form is a macro which has
no function other than to call the primitive form.
Note that the space is neccessary, `BITS32' will _not_ work!
5.2. `SECTION' or `SEGMENT': Changing and Defining Sections
The `SECTION' directive (`SEGMENT' is an exactly equivalent synonym)
changes which section of the output file the code you write will be
assembled into. In some object file formats, the number and names of
sections are fixed; in others, the user may make up as many as they
wish. Hence `SECTION' may sometimes give an error message, or may
define a new section, if you try to switch to a section that does not
The Unix object formats, and the `bin' object format (but see *Note
Section 6.1.3::, all support the standardised section names `.text',
`.data' and `.bss' for the code, data and uninitialised-data sections.
The `obj' format, by contrast, does not recognise these section names
as being special, and indeed will strip off the leading period of any
section name that has one.
5.4. `EXTERN': Importing Symbols from Other Modules
`EXTERN' is similar to the MASM directive `EXTRN' and the C keyword
`extern': it is used to declare a symbol which is not defined anywhere
in the module being assembled, but is assumed to be defined in some
other module and needs to be referred to by this one. Not every
object-file format can support external variables: the `bin' format
The `EXTERN' directive takes as many arguments as you like. Each
argument is the name of a symbol:
Some object-file formats provide extra features to the `EXTERN'
directive. In all cases, the extra features are used by suffixing a
colon to the symbol name followed by object-format specific text. For
example, the `obj' format allows you to declare that the default
segment base of an external should be the group `dgroup' by means of
extern _variable:wrt dgroup
The primitive form of `EXTERN' differs from the user-level form only
in that it can take only one argument at a time: the support for
multiple arguments is implemented at the preprocessor level.
5.5. `GLOBAL': Exporting Symbols to Other Modules
`GLOBAL' is the other end of `EXTERN': if one module declares a
symbol as `EXTERN' and refers to it, then in order to prevent linker
errors, some other module must actually _define_ the symbol and declare
it as `GLOBAL'. Some assemblers use the name `PUBLIC' for this purpose.
The `GLOBAL' directive applying to a symbol must appear _before_ the
definition of the symbol.
`GLOBAL' uses the same syntax as `EXTERN', except that it must refer
to symbols which _are_ defined in the same module as the `GLOBAL'
directive. For example:
; some code
`GLOBAL', like `EXTERN', allows object formats to define private
extensions by means of a colon. The `elf' object format, for example,
lets you specify whether global data items are functions or data:
global hashlookup:function, hashtable:data
Like `EXTERN', the primitive form of `GLOBAL' differs from the
user-level form only in that it can take only one argument at a time.
5.6. `COMMON': Defining Common Data Areas
The `COMMON' directive is used to declare _common variables_. A
common variable is much like a global variable declared in the
uninitialised data section, so that
common intvar 4
is similar in function to
intvar resd 1
The difference is that if more than one module defines the same
common variable, then at link time those variables will be _merged_, and
references to `intvar' in all modules will point at the same piece of
Like `GLOBAL' and `EXTERN', `COMMON' supports object-format specific
extensions. For example, the `obj' format allows common variables to be
NEAR or FAR, and the `elf' format allows you to specify the alignment
requirements of a common variable:
common commvar 4:near ; works in OBJ
common intarray 100:4 ; works in ELF: 4 byte aligned
Chapter 6: Output Formats
NASM is a portable assembler, designed to be able to compile on any
ANSI C- supporting platform and produce output to run on a variety of
Intel x86 operating systems. For this reason, it has a large number of
available output formats, selected using the `-f' option on the NASM
command line. Each of these formats, along with its extensions to the
base NASM syntax, is detailed in this chapter.
As stated in *Note Section 2.1.1::, NASM chooses a default name for
your output file based on the input file name and the chosen output
format. This will be generated by removing the extension (`.asm', `.s',
or whatever you like to use) from the input file name, and substituting
an extension defined by the output format. The extensions are given
with each format below.
6.1. `bin': Flat-Form Binary Output
The `bin' format does not produce object files: it generates nothing
in the output file except the code you wrote. Such `pure binary' files
are used by MS-DOS: `.COM' executables and `.SYS' device drivers are
pure binary files. Pure binary output is also useful for operating
system and boot loader development.
The `bin' format supports multiple section names. For details of how
nasm handles sections in the `bin' format, see *Note Section 6.1.3::.
Using the `bin' format puts NASM by default into 16-bit mode (see
*Note Section 5.1::). In order to use `bin' to write 32-bit code such as
an OS kernel, you need to explicitly issue the `BITS 32' directive.
`bin' has no default output file name extension: instead, it leaves
your file name as it is once the original extension has been removed.
Thus, the default is for NASM to assemble `binprog.asm' into a binary
file called `binprog'.
6.1.1. `ORG': Binary File Program Origin
The `bin' format provides an additional directive to the list given
in *Note Chapter 5::: `ORG'. The function of the `ORG' directive is to
specify the origin address which NASM will assume the program begins at
when it is loaded into memory.
For example, the following code will generate the longword
Unlike the `ORG' directive provided by MASM-compatible assemblers,
which allows you to jump around in the object file and overwrite code
you have already generated, NASM's `ORG' does exactly what the directive
says: _origin_. Its sole function is to specify one offset which is
added to all internal address references within the section; it does not
permit any of the trickery that MASM's version does. See *Note Section
10.1.3:: for further comments.
6.5. `elf': Executable and Linkable Format Object Files
The `elf' output format generates `ELF32' (Executable and Linkable
Format) object files, as used by Linux as well as Unix System V,
including Solaris x86, UnixWare and SCO Unix. `elf' provides a default
output file-name extension of `.o'.
6.5.3. `elf' Extensions to the `GLOBAL' Directive
`ELF' object files can contain more information about a global symbol
than just its address: they can contain the size of the symbol and its
type as well. These are not merely debugger conveniences, but are
actually necessary when the program being written is a shared library.
NASM therefore supports some extensions to the `GLOBAL' directive,
allowing you to specify these features.
You can specify whether a global variable is a function or a data
object by suffixing the name with a colon and the word `function' or
`data'. (`object' is a synonym for `data'.) For example:
global hashlookup:function, hashtable:data
exports the global symbol `hashlookup' as a function and `hashtable'
as a data object.
You can also specify the size of the data associated with the
symbol, as a numeric expression (which may involve labels, and even
forward references) after the type specifier. Like this:
global hashtable:data (hashtable.end - hashtable)
db this,that,theother ; some data here
6.8. `as86': Minix/Linux `as86' Object Files
The Minix/Linux 16-bit assembler `as86' has its own non-standard
object file format. Although its companion linker `ld86' produces
something close to ordinary `a.out' binaries as output, the object file
format used to communicate between `as86' and `ld86' is not itself
NASM supports this format, just in case it is useful, as `as86'.
`as86' provides a default output file-name extension of `.o'.
`as86' is a very simple object format (from the NASM user's point of
view). It supports no special directives, no special symbols, no use of
`SEG' or `WRT', and no extensions to any standard directives. It
supports only the three standard section names `.text', `.data' and
8.2. Writing NetBSD/FreeBSD/OpenBSD and Linux/ELF Shared Libraries
`ELF' replaced the older `a.out' object file format under Linux
because it contains support for position-independent code (PIC), which
makes writing shared libraries much easier. NASM supports the `ELF'
position-independent code features, so you can write Linux `ELF' shared
libraries in NASM.
NetBSD, and its close cousins FreeBSD and OpenBSD, take a different
approach by hacking PIC support into the `a.out' format. NASM supports
this as the `aoutb' output format, so you can write BSD shared
libraries in NASM too.
The operating system loads a PIC shared library by memory-mapping the
library file at an arbitrarily chosen point in the address space of the
running process. The contents of the library's code section must
therefore not depend on where it is loaded in memory.
Therefore, you cannot get at your variables by writing code like
mov eax,[myvar] ; WRONG
Instead, the linker provides an area of memory called the _global
offset table_, or GOT; the GOT is situated at a constant distance from
your library's code, so if you can find out where your library is
loaded (which is typically done using a `CALL' and `POP' combination),
you can obtain the address of the GOT, and you can then load the
addresses of your variables out of linker-generated entries in the GOT.
The _data_ section of a PIC shared library does not have these
restrictions: since the data section is writable, it has to be copied
into memory anyway rather than just paged in from the library file, so
as long as it's being copied it can be relocated too. So you can put
ordinary types of relocation in the data section without too much worry
(but see *Note Section 8.2.4:: for a caveat).
8.2.1. Obtaining the Address of the GOT
Each code module in your shared library should define the GOT as an
extern _GLOBAL_OFFSET_TABLE_ ; in ELF
extern __GLOBAL_OFFSET_TABLE_ ; in BSD a.out
At the beginning of any function in your shared library which plans
to access your data or BSS sections, you must first calculate the
address of the GOT. This is typically done by writing the function in
func: push ebp
add ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc
; the function body comes here
(For BSD, again, the symbol `_GLOBAL_OFFSET_TABLE' requires a second
The first two lines of this function are simply the standard C
prologue to set up a stack frame, and the last three lines are standard
C function epilogue. The third line, and the fourth to last line, save
and restore the `EBX' register, because PIC shared libraries use this
register to store the address of the GOT.
The interesting bit is the `CALL' instruction and the following two
lines. The `CALL' and `POP' combination obtains the address of the
label `.get_GOT', without having to know in advance where the program
was loaded (since the `CALL' instruction is encoded relative to the
current position). The `ADD' instruction makes use of one of the
special PIC relocation types: GOTPC relocation. With the `WRT ..gotpc'
qualifier specified, the symbol referenced (here
`_GLOBAL_OFFSET_TABLE_', the special symbol assigned to the GOT) is
given as an offset from the beginning of the section. (Actually, `ELF'
encodes it as the offset from the operand field of the `ADD'
instruction, but NASM simplifies this deliberately, so you do things the
same way for both `ELF' and `BSD'.) So the instruction then _adds_ the
beginning of the section, to get the real address of the GOT, and
subtracts the value of `.get_GOT' which it knows is in `EBX'.
Therefore, by the time that instruction has finished, `EBX' contains
the address of the GOT.
If you didn't follow that, don't worry: it's never necessary to
obtain the address of the GOT by any other means, so you can put those
three instructions into a macro and safely ignore them:
%macro get_GOT 0
add ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc
8.2.4. Exporting Symbols to the Library User
If you want to export symbols to the user of the library, you have to
declare whether they are functions or data, and if they are data, you
have to give the size of the data item. This is because the dynamic
linker has to build procedure linkage table entries for any exported
functions, and also moves exported data items away from the library's
data section in which they were declared.
So to export a function to users of the library, you must use
global func:function ; declare it as a function
func: push ebp
And to export a data item such as an array, you would have to code
global array:data array.end-array ; give the size too
array: resd 128
Be careful: If you export a variable to the library user, by
declaring it as `GLOBAL' and supplying a size, the variable will end up
living in the data section of the main program, rather than in your
library's data section, where you declared it. So you will have to
access your own global variable with the `..got' mechanism rather than
`..gotoff', as if it were external (which, effectively, it has become).
Equally, if you need to store the address of an exported global in
one of your data sections, you can't do it by means of the standard
sort of code:
dataptr: dd global_data_item ; WRONG
NASM will interpret this code as an ordinary relocation, in which
`global_data_item' is merely an offset from the beginning of the
`.data' section (or whatever); so this reference will end up pointing
at your data section instead of at the exported global which resides
Instead of the above code, then, you must write
dataptr: dd global_data_item wrt ..sym
which makes use of the special `WRT' type `..sym' to instruct NASM
to search the symbol table for a particular symbol at that address,
rather than just relocating by section base.
Either method will work for functions: referring to one of your
functions by means of
funcptr: dd my_function
will give the user the address of the code you wrote, whereas
funcptr: dd my_function wrt .sym
will give the address of the procedure linkage table for the
function, which is where the calling program will _believe_ the
function lives. Either address is a valid way to call the function.
10.1.1. NASM Generates Inefficient Code
We sometimes get `bug' reports about NASM generating inefficient, or
even `wrong', code on instructions such as `ADD ESP,8'. This is a
deliberate design feature, connected to predictability of output: NASM,
on seeing `ADD ESP,8', will generate the form of the instruction which
leaves room for a 32-bit offset. You need to code `ADD ESP,BYTE 8' if
you want the space-efficient form of the instruction. This isn't a bug,
it's user error: if you prefer to have NASM produce the more efficient
code automatically enable optimization with the `-On' option (see *Note
10.1.2. My Jumps are Out of Range
Similarly, people complain that when they issue conditional jumps
(which are `SHORT' by default) that try to jump too far, NASM reports
`short jump out of range' instead of making the jumps longer.
This, again, is partly a predictability issue, but in fact has a more
practical reason as well. NASM has no means of being told what type of
processor the code it is generating will be run on; so it cannot decide
for itself that it should generate `Jcc NEAR' type instructions, because
it doesn't know that it's working for a 386 or above. Alternatively, it
could replace the out-of-range short `JNE' instruction with a very
short `JE' instruction that jumps over a `JMP NEAR'; this is a sensible
solution for processors below a 386, but hardly efficient on processors
which have good branch prediction _and_ could have used `JNE NEAR'
instead. So, once again, it's up to the user, not the assembler, to
decide what instructions should be generated. See *Note Section
10.1.3. `ORG' Doesn't Work
People writing boot sector programs in the `bin' format often
complain that `ORG' doesn't work the way they'd like: in order to place
the `0xAA55' signature word at the end of a 512-byte boot sector, people
who are used to MASM tend to code
; some boot sector code
This is not the intended use of the `ORG' directive in NASM, and will
not work. The correct way to solve this problem in NASM is to use the
`TIMES' directive, like this:
; some boot sector code
The `TIMES' directive will insert exactly enough zero bytes into the
output to move the assembly point up to 510. This method also has the
advantage that if you accidentally fill your boot sector too full, NASM
will catch the problem at assembly time and report it, so you won't end
up with a boot sector that you have to disassemble to find out what's
wrong with it.
Полный архив NASM info можно взять тут (180 кб) .