Search     or:     and:
 LINUX 
 Language 
 Kernel 
 Package 
 Book 
 Test 
 OS 
 Forum 
 iakovlev.org 
 Languages
 С
 GNU С Library 
 Qt 
 STL 
 Threads 
 C++ 
 Samples 
 stanford.edu 
 ANSI C
 Libs
 LD
 Socket
 Pusher
 Pipes
 Encryption
 Plugin
 Inter-Process
 Errors
 Deep C Secrets
 C + UNIX
 Linked Lists / Trees
 Asm
 Perl
 Python
 Shell
 Erlang
 Go
 Rust
 Алгоритмы
NEWS
Последние статьи :
  Тренажёр 16.01   
  Эльбрус 05.12   
  Алгоритмы 12.04   
  Rust 07.11   
  Go 25.12   
  EXT4 10.11   
  FS benchmark 15.09   
  Сетунь 23.07   
  Trees 25.06   
  Apache 03.02   
 
TOP 20
 MINIX...3057 
 Solaris...2933 
 LD...2904 
 Linux Kernel 2.6...2470 
 William Gropp...2180 
 Rodriguez 6...2011 
 C++ Templates 3...1945 
 Trees...1937 
 Kamran Husain...1865 
 Secure Programming for Li...1791 
 Максвелл 5...1710 
 DevFS...1693 
 Part 3...1682 
 Stein-MacEachern-> Час...1632 
 Go Web ...1624 
 Ethreal 4...1618 
 Arrays...1606 
 Стивенс 9...1603 
 Максвелл 1...1592 
 FAQ...1538 
 
  01.01.2024 : 3621733 посещений 

iakovlev.org

NASM info

3.1. Layout of a NASM Source Line Как в других ассемблерах,каждая строка NASM-исходника состоит из 4 основных частей :
 
      label:    instruction operands        ; comment


As usual, most of these fields are optional; the presence or absence of any combination of a label, an instruction and a comment is allowed. Of course, the operand field is either required or forbidden by the presence and nature of the instruction field.

NASM uses backslash (\) as the line continuation character; if a line ends with backslash, the next line is considered to be a part of the backslash- ended line.

NASM places no restrictions on white space within a line: labels may have white space before them, or instructions may have no space before them, or anything. The colon after a label is also optional. (Note that this means that if you intend to code `lodsb' alone on a line, and type `lodab' by accident, then that's still a valid source line which does nothing but define a label. Running NASM with the command-line option `-w+orphan-labels' will cause it to warn you if you define a label alone on a line without a trailing colon.)

Valid characters in labels are letters, numbers, `_', `$', `#', `@', `~', `.', and `?'. The only characters which may be used as the _first_ character of an identifier are letters, `.' (with special meaning: see *Note Section 3.9::), `_' and `?'. An identifier may also be prefixed with a `$' to indicate that it is intended to be read as an identifier and not a reserved word; thus, if some other module you are linking with defines a symbol called `eax', you can refer to `$eax' in NASM code to distinguish the symbol from the register.

The instruction field may contain any machine instruction: Pentium and P6 instructions, FPU instructions, MMX instructions and even undocumented instructions are all supported. The instruction may be prefixed by `LOCK', `REP', `REPE'/`REPZ' or `REPNE'/`REPNZ', in the usual way. Explicit address-size and operand-size prefixes `A16', `A32', `O16' and `O32' are provided - one example of their use is given in *Note Chapter 9::. You can also use the name of a segment register as an instruction prefix: coding `es mov [bx],ax' is equivalent to coding `mov [es:bx],ax'. We recommend the latter syntax, since it is consistent with other syntactic features of the language, but for instructions such as `LODSB', which has no operands and yet can require a segment override, there is no clean syntactic way to proceed apart from `es lodsb'.

An instruction is not required to use a prefix: prefixes such as `CS', `A32', `LOCK' or `REPE' can appear on a line by themselves, and NASM will just generate the prefix bytes.

In addition to actual machine instructions, NASM also supports a number of pseudo-instructions, described in *Note Section 3.2::. Instruction operands may take a number of forms: they can be registers, described simply by the register name (e.g. `ax', `bp', `ebx', `cr0': NASM does not use the `gas'-style syntax in which register names must be prefixed by a `%' sign), or they can be effective addresses (see *Note Section 3.3::), constants (*Note Section 3.4::) or expressions (*Note Section 3.5::).

For floating-point instructions, NASM accepts a wide range of syntaxes: you can use two-operand forms like MASM supports, or you can use NASM's native single-operand forms in most cases. Details of all forms of each supported instruction are given in *Note Appendix B::. For example, you can code:
                                                                                        
              fadd    st1             ; this sets st0 := st0 + st1
              fadd    st0,st1         ; so does this
                                                                                        
              fadd    st1,st0         ; this sets st1 := st1 + st0
              fadd    to st1          ; so does this
Almost any floating-point instruction that references memory must use one of the prefixes `DWORD', `QWORD' or `TWORD' to indicate what size of memory operand it refers to.

3.2. Pseudo-Instructions

Pseudo-instructions are things which, though not real x86 machine instructions, are used in the instruction field anyway because that's the most convenient place to put them. The current pseudo-instructions are `DB', `DW', `DD', `DQ' and `DT', their uninitialised counterparts `RESB', `RESW', `RESD', `RESQ' and `REST', the `INCBIN' command, the `EQU' command, and the `TIMES' prefix.

3.2.1. `DB' and friends: Declaring Initialised Data ---------------------------------------------------

`DB', `DW', `DD', `DQ' and `DT' are used, much as in MASM, to declare initialised data in the output file. They can be invoked in a wide range of ways:
                                                                                        
            db    0x55                ; just the byte 0x55
            db    0x55,0x56,0x57      ; three bytes in succession
            db    'a',0x55            ; character constants are OK
            db    'hello',13,10,'$'   ; so are string constants
            dw    0x1234              ; 0x34 0x12
            dw    'a'                 ; 0x61 0x00 (it's just a number)
            dw    'ab'                ; 0x61 0x62 (character constant)
            dw    'abc'               ; 0x61 0x62 0x63 0x00 (string)
            dd    0x12345678          ; 0x78 0x56 0x34 0x12
            dd    1.234567e20         ; floating-point constant
            dq    1.234567e20         ; double-precision float
            dt    1.234567e20         ; extended-precision float
`DQ' and `DT' do not accept numeric constants or string constants as operands.

3.2.2. `RESB' and friends: Declaring Uninitialised Data

`RESB', `RESW', `RESD', `RESQ' and `REST' are designed to be used in the BSS section of a module: they declare _uninitialised_ storage space. Each takes a single operand, which is the number of bytes, words, doublewords or whatever to reserve. As stated in *Note Section 2.2.7::, NASM does not support the MASM/TASM syntax of reserving uninitialised space by writing `DW ?' or similar things: this is what it does instead. The operand to a `RESB'-type pseudo- instruction is a _critical expression_: see *Note Section 3.8::.

For example:
                                                                                        
      buffer:         resb    64              ; reserve 64 bytes
      wordvar:        resw    1               ; reserve a word
      realarray       resq    10              ; array of ten reals


3.2.3. `INCBIN': Including External Binary Files

`INCBIN' is borrowed from the old Amiga assembler DevPac: it includes a binary file verbatim into the output file. This can be handy for (for example) including graphics and sound data directly into a game executable file. It can be called in one of these three ways:
                                                                                        
          incbin  "file.dat"             ; include the whole file
          incbin  "file.dat",1024        ; skip the first 1024 bytes
          incbin  "file.dat",1024,512    ; skip the first 1024, and
                                         ; actually include at most 512


3.2.4. `EQU': Defining Constants

`EQU' defines a symbol to a given constant value: when `EQU' is used, the source line must contain a label. The action of `EQU' is to define the given label name to the value of its (only) operand. This definition is absolute, and cannot change later. So, for example,
                                                                                        
      message         db      'hello, world'
      msglen          equ     $-message
defines `msglen' to be the constant 12. `msglen' may not then be redefined later. This is not a preprocessor definition either: the value of `msglen' is evaluated _once_, using the value of `$' (see *Note Section 3.5:: for an explanation of `$') at the point of definition, rather than being evaluated wherever it is referenced and using the value of `$' at the point of reference. Note that the operand to an `EQU' is also a critical expression (*Note Section 3.8::).

3.2.5. `TIMES': Repeating Instructions or Data

The `TIMES' prefix causes the instruction to be assembled multiple times. This is partly present as NASM's equivalent of the `DUP' syntax supported by MASM-compatible assemblers, in that you can code

zerobuf: times 64 db 0

or similar things; but `TIMES' is more versatile than that. The argument to `TIMES' is not just a numeric constant, but a numeric _expression_, so you can do things like
                                                                                       
      buffer: db      'hello, world'
              times 64-$+buffer db ' '
which will store exactly enough spaces to make the total length of `buffer' up to 64. Finally, `TIMES' can be applied to ordinary instructions, so you can code trivial unrolled loops in it:

times 100 movsb

Note that there is no effective difference between `times 100 resb 1' and `resb 100', except that the latter will be assembled about 100 times faster due to the internal structure of the assembler.

The operand to `TIMES', like that of `EQU' and those of `RESB' and friends, is a critical expression (*Note Section 3.8::).

3.3. Effective Addresses

An effective address is any operand to an instruction which references memory. Effective addresses, in NASM, have a very simple syntax: they consist of an expression evaluating to the desired address, enclosed in square brackets. For example:
                                                                                       
      wordvar dw      123
              mov     ax,[wordvar]
              mov     ax,[wordvar+1]
              mov     ax,[es:wordvar+bx]
Anything not conforming to this simple system is not a valid memory reference in NASM, for example `es:wordvar[bx]'.

More complicated effective addresses, such as those involving more than one register, work in exactly the same way:
                                                                                       
              mov     eax,[ebx*2+ecx+offset]
              mov     ax,[bp+di+8]
NASM is capable of doing algebra on these effective addresses, so that things which don't necessarily _look_ legal are perfectly all right:
                                                                                        
          mov     eax,[ebx*5]             ; assembles as [ebx*4+ebx]
          mov     eax,[label1*2-label2]   ; ie [label1+(label1-label2)]
Some forms of effective address have more than one assembled form; in most such cases NASM will generate the smallest form it can. For example, there are distinct assembled forms for the 32-bit effective addresses `[eax*2+0]' and `[eax+eax]', and NASM will generally generate the latter on the grounds that the former requires four bytes to store a zero offset.

NASM has a hinting mechanism which will cause `[eax+ebx]' and `[ebx+eax]' to generate different opcodes; this is occasionally useful because `[esi+ebp]' and `[ebp+esi]' have different default segment registers.

However, you can force NASM to generate an effective address in a particular form by the use of the keywords `BYTE', `WORD', `DWORD' and `NOSPLIT'. If you need `[eax+3]' to be assembled using a double-word offset field instead of the one byte NASM will normally generate, you can code `[dword eax+3]'. Similarly, you can force NASM to use a byte offset for a small value which it hasn't seen on the first pass (see *Note Section 3.8:: for an example of such a code fragment) by using `[byte eax+offset]'. As special cases, `[byte eax]' will code `[eax+0]' with a byte offset of zero, and `[dword eax]' will code it with a double-word offset of zero. The normal form, `[eax]', will be coded with no offset field.

The form described in the previous paragraph is also useful if you are trying to access data in a 32-bit segment from within 16 bit code. For more information on this see the section on mixed-size addressing (*Note Section 9.2::). In particular, if you need to access data with a known offset that is larger than will fit in a 16-bit value, if you don't specify that it is a dword offset, nasm will cause the high word of the offset to be lost. Similarly, NASM will split `[eax*2]' into `[eax+eax]' because that allows the offset field to be absent and space to be saved; in fact, it will also split `[eax*2+offset]' into `[eax+eax+offset]'. You can combat this behaviour by the use of the `NOSPLIT' keyword: `[nosplit eax*2]' will force `[eax*2+0]' to be generated literally.

3.4. Constants

NASM understands four different types of constant: numeric, character, string and floating-point.

3.4.1. Numeric Constants

A numeric constant is simply a number. NASM allows you to specify numbers in a variety of number bases, in a variety of ways: you can suffix `H', `Q' or `O', and `B' for hex, octal and binary, or you can prefix `0x' for hex in the style of C, or you can prefix `$' for hex in the style of Borland Pascal. Note, though, that the `$' prefix does double duty as a prefix on identifiers (see *Note Section 3.1::), so a hex number prefixed with a `$' sign must have a digit after the `$' rather than a letter.

Some examples:
                                                                                        
              mov     ax,100          ; decimal
              mov     ax,0a2h         ; hex
              mov     ax,$0a2         ; hex again: the 0 is required
              mov     ax,0xa2         ; hex yet again
              mov     ax,777q         ; octal
              mov     ax,777o         ; octal again
              mov     ax,10010011b    ; binary
 
 3.4.2. Character Constants
 

A character constant consists of up to four characters enclosed in either single or double quotes. The type of quote makes no difference to NASM, except of course that surrounding the constant with single quotes allows double quotes to appear within it and vice versa.

A character constant with more than one character will be arranged with little-endian order in mind: if you code
                                                                                        
                mov eax,'abcd'
then the constant generated is not `0x61626364', but `0x64636261', so that if you were then to store the value into memory, it would read `abcd' rather than `dcba'. This is also the sense of character constants understood by the Pentium's `CPUID' instruction (see *Note Section B.4.34::).

3.4.3. String Constants

String constants are only acceptable to some pseudo-instructions, namely the `DB' family and `INCBIN'.

A string constant looks like a character constant, only longer. It is treated as a concatenation of maximum-size character constants for the conditions. So the following are equivalent:
                                                                                        
            db    'hello'               ; string constant
            db    'h','e','l','l','o'   ; equivalent character constants
                                                                                        
    And the following are also equivalent:
                                                                                        
            dd    'ninechars'           ; doubleword string constant
            dd    'nine','char','s'     ; becomes three doublewords
            db    'ninechars',0,0,0     ; and really looks like this
Note that when used as an operand to `db', a constant like `'ab'' is treated as a string constant despite being short enough to be a character constant, because otherwise `db 'ab'' would have the same effect as `db 'a'', which would be silly. Similarly, three-character or four-character constants are treated as strings when they are operands to `dw'.

3.4.4. Floating-Point Constants

Floating-point constants are acceptable only as arguments to `DD', `DQ' and `DT'. They are expressed in the traditional form: digits, then a period, then optionally more digits, then optionally an `E' followed by an exponent. The period is mandatory, so that NASM can distinguish between `dd 1', which declares an integer constant, and `dd 1.0' which declares a floating-point constant.

Some examples:
                                                                                        
            dd    1.2                     ; an easy one
            dq    1.e10                   ; 10,000,000,000
            dq    1.e+10                  ; synonymous with 1.e10
            dq    1.e-10                  ; 0.000 000 000 1
            dt    3.141592653589793238462 ; pi
NASM cannot do compile-time arithmetic on floating-point constants. This is because NASM is designed to be portable - although it always generates code to run on x86 processors, the assembler itself can run on any system with an ANSI C compiler. Therefore, the assembler cannot guarantee the presence of a floating-point unit capable of handling the Intel number formats, and so for NASM to be able to do floating arithmetic it would have to include its own complete set of floating-point routines, which would significantly increase the size of the assembler for very little benefit.

3.5. Expressions

Expressions in NASM are similar in syntax to those in C.

NASM does not guarantee the size of the integers used to evaluate expressions at compile time: since NASM can compile and run on 64-bit systems quite happily, don't assume that expressions are evaluated in 32- bit registers and so try to make deliberate use of integer overflow. It might not always work. The only thing NASM will guarantee is what's guaranteed by ANSI C: you always have _at least_ 32 bits to work in.

NASM supports two special tokens in expressions, allowing calculations to involve the current assembly position: the `$' and `$$' tokens. `$' evaluates to the assembly position at the beginning of the line containing the expression; so you can code an infinite loop using `JMP $'. `$$' evaluates to the beginning of the current section; so you can tell how far into the section you are by using `($-$$)'.

The arithmetic operators provided by NASM are listed here, in increasing order of precedence.

3.5.1. `|': Bitwise OR Operator

The `|' operator gives a bitwise OR, exactly as performed by the `OR' machine instruction. Bitwise OR is the lowest-priority arithmetic operator supported by NASM.

3.5.2. `^': Bitwise XOR Operator

`^' provides the bitwise XOR operation.

3.5.3. `&': Bitwise AND Operator

`&' provides the bitwise AND operation.

3.5.4. `<<' and `>>': Bit Shift Operators

`<<' gives a bit-shift to the left, just as it does in C. So `5<<3' evaluates to 5 times 8, or 40. `>>' gives a bit-shift to the right; in NASM, such a shift is _always_ unsigned, so that the bits shifted in from the left-hand end are filled with zero rather than a sign-extension of the previous highest bit.

3.5.6. `*', `/', `//', `%' and `%%': Multiplication and Division

`*' is the multiplication operator. `/' and `//' are both division operators: `/' is unsigned division and `//' is signed division. Similarly, `%' and `%%' provide unsigned and signed modulo operators respectively.

NASM, like ANSI C, provides no guarantees about the sensible operation of the signed modulo operator.

Since the `%' character is used extensively by the macro preprocessor, you should ensure that both the signed and unsigned modulo operators are followed by white space wherever they appear.

3.6. `SEG' and `WRT'

When writing large 16-bit programs, which must be split into multiple segments, it is often necessary to be able to refer to the segment part of the address of a symbol. NASM supports the `SEG' operator to perform this function.

The `SEG' operator returns the _preferred_ segment base of a symbol, defined as the segment base relative to which the offset of the symbol makes sense. So the code
                                                                                        
              mov     ax,seg symbol
              mov     es,ax
              mov     bx,symbol
will load `ES:BX' with a valid pointer to the symbol `symbol'.

Things can be more complex than this: since 16-bit segments and groups may overlap, you might occasionally want to refer to some symbol using a different segment base from the preferred one. NASM lets you do this, by the use of the `WRT' (With Reference To) keyword. So you can do things like
                                                                                        
              mov     ax,weird_seg        ; weird_seg is a segment base
              mov     es,ax
              mov     bx,symbol wrt weird_seg
to load `ES:BX' with a different, but functionally equivalent, pointer to the symbol `symbol'. NASM supports far (inter-segment) calls and jumps by means of the syntax `call segment:offset', where `segment' and `offset' both represent immediate values. So to call a far procedure, you could code either of
                                                                                        
              call    (seg procedure):procedure
              call    weird_seg:(procedure wrt weird_seg)
(The parentheses are included for clarity, to show the intended parsing of the above instructions. They are not necessary in practice.)

NASM supports the syntax `call far procedure' as a synonym for the first of the above usages. `JMP' works identically to `CALL' in these examples.

To declare a far pointer to a data item in a data segment, you must code

dw symbol, seg symbol

NASM supports no convenient synonym for this, though you can always invent one using the macro processor.

3.7. `STRICT': Inhibiting Optimization

When assembling with the optimizer set to level 2 or higher (see *Note Section 2.1.16::), NASM will use size specifiers (`BYTE', `WORD', `DWORD', `QWORD', or `TWORD'), but will give them the smallest possible size. The keyword `STRICT' can be used to inhibit optimization and force a particular operand to be emitted in the specified size. For example, with the optimizer on, and in `BITS 16' mode,

push dword 33

is encoded in three bytes `66 6A 21', whereas

push strict dword 33

is encoded in six bytes, with a full dword immediate operand `66 68 21 00 00 00'.

With the optimizer off, the same code (six bytes) is generated whether the `STRICT' keyword was used or not.

3.8. Critical Expressions

A limitation of NASM is that it is a two-pass assembler; unlike TASM and others, it will always do exactly two assembly passes. Therefore it is unable to cope with source files that are complex enough to require three or more passes.

The first pass is used to determine the size of all the assembled code and data, so that the second pass, when generating all the code, knows all the symbol addresses the code refers to. So one thing NASM can't handle is code whose size depends on the value of a symbol declared after the code in question. For example,
                                                                                       
              times (label-$) db 0
      label:  db      'Where am I?'
 
The argument to `TIMES' in this case could equally legally evaluate to anything at all; NASM will reject this example because it cannot tell the size of the `TIMES' line when it first sees it. It will just as firmly reject the slightly paradoxical code
                                                                                       
              times (label-$+1) db 0
      label:  db      'NOW where am I?'
 
in which _any_ value for the `TIMES' argument is by definition wrong! NASM rejects these examples by means of a concept called a _critical expression_, which is defined to be an expression whose value is required to be computable in the first pass, and which must therefore depend only on symbols defined before it. The argument to the `TIMES' prefix is a critical expression; for the same reason, the arguments to the `RESB' family of pseudo-instructions are also critical expressions.

Critical expressions can crop up in other contexts as well: consider the following code.
                                                                                       
                      mov     ax,symbol1
      symbol1         equ     symbol2
      symbol2:
 
On the first pass, NASM cannot determine the value of `symbol1', because `symbol1' is defined to be equal to `symbol2' which NASM hasn't seen yet. On the second pass, therefore, when it encounters the line `mov ax,symbol1', it is unable to generate the code for it because it still doesn't know the value of `symbol1'. On the next line, it would see the `EQU' again and be able to determine the value of `symbol1', but by then it would be too late.

NASM avoids this problem by defining the right-hand side of an `EQU' statement to be a critical expression, so the definition of `symbol1' would be rejected in the first pass.

There is a related issue involving forward references: consider this code fragment.
              mov     eax,[ebx+offset]
      offset  equ     10
 
NASM, on pass one, must calculate the size of the instruction `mov eax,[ebx+offset]' without knowing the value of `offset'. It has no way of knowing that `offset' is small enough to fit into a one- byte offset field and that it could therefore get away with generating a shorter form of the effective-address encoding; for all it knows, in pass one, `offset' could be a symbol in the code segment, and it might need the full four-byte form. So it is forced to compute the size of the instruction to accommodate a four-byte address part. In pass two, having made this decision, it is now forced to honour it and keep the instruction large, so the code generated in this case is not as small as it could have been. This problem can be solved by defining `offset' before using it, or by forcing byte size in the effective address by coding `[byte ebx+offset]'.

3.9. Local Labels

NASM gives special treatment to symbols beginning with a period. A label beginning with a single period is treated as a _local_ label, which means that it is associated with the previous non-local label. So, for example:
                                                                                        
      label1  ; some code
                                                                                        
      .loop
              ; some more code
                                                                                        
              jne     .loop
              ret
                                                                                        
      label2  ; some code
                                                                                        
      .loop
              ; some more code
                                                                                        
              jne     .loop
              ret
 
In the above code fragment, each `JNE' instruction jumps to the line immediately before it, because the two definitions of `.loop' are kept separate by virtue of each being associated with the previous non-local label.

This form of local label handling is borrowed from the old Amiga assembler DevPac; however, NASM goes one step further, in allowing access to local labels from other parts of the code. This is achieved by means of _defining_ a local label in terms of the previous non-local label: the first definition of `.loop' above is really defining a symbol called `label1.loop', and the second defines a symbol called `label2.loop'. So, if you really needed to, you could write
                                                                                       
      label3  ; some more code
              ; and some more
                                                                                        
              jmp label1.loop
 
Sometimes it is useful - in a macro, for instance - to be able to define a label which can be referenced from anywhere but which doesn't interfere with the normal local-label mechanism. Such a label can't be non-local because it would interfere with subsequent definitions of, and references to, local labels; and it can't be local because the macro that defined it wouldn't know the label's full name. NASM therefore introduces a third type of label, which is probably only useful in macro definitions: if a label begins with the special prefix `..@', then it does nothing to the local label mechanism. So you could code
                                                                                        
      label1:                         ; a non-local label
      .local:                         ; this is really label1.local
      ..@foo:                         ; this is a special symbol
      label2:                         ; another non-local label
      .local:                         ; this is really label2.local
                                                                                        
              jmp     ..@foo          ; this will jump three lines up
 
Chapter 4: The NASM Preprocessor

                                                                                      
    NASM contains a powerful macro processor, which supports conditional
 assembly, multi-level file inclusion, two forms of macro (single-line
 and multi-line), and a `context stack' mechanism for extra macro power.
 Preprocessor directives all begin with a `%' sign.
 

The preprocessor collapses all lines which end with a backslash (\) character into a single line. Thus:
                                                                                       
      %define THIS_VERY_LONG_MACRO_NAME_IS_DEFINED_TO \
              THIS_VALUE
 
will work like a single-line macro without the backslash-newline sequence.

4.1.1. The Normal Way: `%define'

Single-line macros are defined using the `%define' preprocessor directive. The definitions work in a similar way to C; so you can do things like %define ctrl 0x1F & %define param(a,b) ((a)+(a)*(b)) mov byte [param(2,ebx)], ctrl 'D' which will expand to mov byte [(2)+(2)*(ebx)], 0x1F & 'D' When the expansion of a single-line macro contains tokens which invoke another macro, the expansion is performed at invocation time, not at definition time. Thus the code %define a(x) 1+b(x) %define b(x) 2*x mov ax,a(8) will evaluate in the expected way to `mov ax,1+2*8', even though the macro `b' wasn't defined at the time of definition of `a'. Macros defined with `%define' are case sensitive: after `%define foo bar', only `foo' will expand to `bar': `Foo' or `FOO' will not. By using `%idefin!"e' instead of `%define' (the `i' stands for `insensitive') you can define all the case variants of a macro at once, so that `%idefine foo bar' would cause `foo', `Foo', `FOO', `fOO' and so on all to expand to `bar'. There is a mechanism which detects when a macro call has occurred as a result of a previous expansion of the same macro, to guard against circular references and infinite loops. If this happens, the preprocessor will only expand the first occurrence of the macro. Hence, if you code %define a(x) 1+a(x) mov ax,a(3) the macro `a(3)' will expand once, becoming `1+a(3)', and will then expand no further. This behaviour can be useful: see *Note Section 8.1:: for an example of its use. You can overload single-line macros: if you write %define foo(x) 1+x %define foo(x,y) 1+x*y the preprocessor will be able to handle both types of macro call, by counting the parameters you pass; so `foo(3)' will become `1+3' whereas `foo(ebx,2)' will become `1+ebx*2'. However, if you define %define foo bar then no other definition of `foo' will be accepted: a macro with no parameters prohibits the definition of the same name as a macro _with_ parameters, and vice versa. This doesn't prevent single-line macros being _redefined_: you can perfectly well define a macro with %define foo bar and then re-define it later in the same source file with %define foo baz Then everywhere the macro `foo' is invoked, it will be expanded according to the most recent definition. This is particularly useful when defining single-line macros with `%assign' (see *Note Section 4.1.5::). 4.4. Conditional Assembly ========================= Similarly to the C preprocessor, NASM allows sections of a source file to be assembled only if certain conditions are met. The general syntax of this feature looks like this: %if ; some code which only appears if is met %elif ; only appears if is not met but is %else ; this appears if neither nor was met %endif The `%else' clause is optional, as is the `%elif' clause. You can have more than one `%elif' clause as well. 4.4.1. `%ifdef': Testing Single-Line Macro Existence ---------------------------------------------------- Beginning a conditional-assembly block with the line `%ifdef MACRO' will assemble the subsequent code if, and only if, a single-line macro called `MACRO' is defined. If not, then the `%elif' and `%else' blocks (if any) will be processed instead. For example, when debugging a program, you might want to write code such as ; perform some function %ifdef DEBUG writefile 2,"Function performed successfully",13,10 %endif ; go and do something else Then you could use the command-line option `-dDEBUG' to create a version of the program which produced debugging messages, and remove the option to generate the final release version of the program. You can test for a macro _not_ being defined by using `%ifndef' instead of `%ifdef'. You can also test for macro definitions in `%elif' blocks by using `%elifdef' and `%elifndef'. 4.4.2. `ifmacro': Testing Multi-Line Macro Existence ---------------------------------------------------- The `%ifmacro' directive operates in the same way as the `%ifdef' directive, except that it checks for the existence of a multi-line macro. For example, you may be working with a large project and not have control over the macros in a library. You may want to create a macro with one name if it doesn't already exist, and another name if one with that name does exist. The `%ifmacro' is considered true if defining a macro with the given name and number of arguments would cause a definitions conflict. For example: %ifmacro MyMacro 1-3 %error "MyMacro 1-3" causes a conflict with an existing macro. %else %macro MyMacro 1-3 ; insert code to define the macro %endmacro %endif 4.4.4. `%if': Testing Arbitrary Numeric Expressions --------------------------------------------------- The conditional-assembly construct `%if expr' will cause the subsequent code to be assembled if and only if the value of the numeric expression `expr' is non-zero. An example of the use of this feature is in deciding when to break out of a `%rep' preprocessor loop: see *Note Section 4.5:: for a detailed example. The expression given to `%if', and its counterpart `%elif', is a critical expression (see *Note Section 3.8::). `%if' extends the normal NASM expression syntax, by providing a set of relational operators which are not normally available in expressions. The operators `=', `<', `>', `<=', `>=' and `<>' test equality, less-than, greater-than, less-or-equal, greater-or-equal and not-equal respectively. The C-like forms `==' and `!=' are supported as alternative forms of `=' and `<>'. In addition, low- priority logical operators `&&', `^^' and `||' are provided, supplying logical AND, logical XOR and logical OR. These work like the C logical operators (although C has no logical XOR), in that they always return either 0 or 1, and treat any non-zero input as 1 (so that `^^', for example, returns 1 if exactly one of its inputs is zero, and 0 otherwise). The relational operators also return 1 for true and 0 for false. 4.5. Preprocessor Loops: `%rep' =============================== NASM's `TIMES' prefix, though useful, cannot be used to invoke a multi-line macro multiple times, because it is processed by NASM after macros have already been expanded. Therefore NASM provides another form of loop, this time at the preprocessor level: `%rep'. The directives `%rep' and `%endrep' (`%rep' takes a numeric argument, which can be an expression; `%endrep' takes no arguments) can be used to enclose a chunk of code, which is then replicated as many times as specified by the preprocessor: %assign i 0 %rep 64 inc word [table+2*i] %assign i i+1 %endrep This will generate a sequence of 64 `INC' instructions, incrementing every word of memory from `[table]' to `[table+126]'. For more complex termination conditions, or to break out of a repeat loop part way along, you can use the `%exitrep' directive to terminate the loop, like this: fibonacci: %assign i 0 %assign j 1 %rep 100 %if j > 65535 %exitrep %endif dw j %assign k j+i %assign i j %assign j k %endrep fib_number equ ($-fibonacci)/2 This produces a list of all the Fibonacci numbers that will fit in 16 bits. Note that a maximum repeat count must still be given to `%rep'. This is to prevent the possibility of NASM getting into an infinite loop in the preprocessor, which (on multitasking or multi-user systems) would typically cause all the system memory to be gradually used up and other applications to start crashing. .6. Including Other Files ========================== Using, once again, a very similar syntax to the C preprocessor, NASM's preprocessor lets you include other source files into your code. This is done by the use of the `%include' directive: %include "macros.mac" will include the contents of the file `macros.mac' into the source file containing the `%include' directive. Include files are searched for in the current directory (the directory you're in when you run NASM, as opposed to the location of the NASM executable or the location of the source file), plus any directories specified on the NASM command line using the `-i' option. The standard C idiom for preventing a file being included more than once is just as applicable in NASM: if the file `macros.mac' has the form %ifndef MACROS_MAC %define MACROS_MAC ; now define some macros %endif then including the file more than once will not cause errors, because the second time the file is included nothing will happen because the macro `MACROS_MAC' will already be defined. 4.7. The Context Stack ====================== Having labels that are local to a macro definition is sometimes not quite powerful enough: sometimes you want to be able to share labels between several macro calls. An example might be a `REPEAT' ... `UNTIL' loop, in which the expansion of the `REPEAT' macro would need to be able to refer to a label which the `UNTIL' macro had defined. However, for such a macro you would also want to be able to nest these loops. NASM provides this level of power by means of a _context stack_. The preprocessor maintains a stack of _contexts_, each of which is characterised by a name. You add a new context to the stack using the `%push' directive, and remove one using `%pop'. You can define labels that are local to a particular context on the stack. 4.7.1. `%push' and `%pop': Creating and Removing Contexts --------------------------------------------------------- The `%push' directive is used to create a new context and place it on the top of the context stack. `%push' requires one argument, which is the name of the context. For example: %push foobar This pushes a new context called `foobar' on the stack. You can have several contexts on the stack with the same name: they can still be distinguished. The directive `%pop', requiring no arguments, removes the top context from the context stack and destroys it, along with any labels associated with it. 4.7.2. Context-Local Labels --------------------------- Just as the usage `%%foo' defines a label which is local to the particular macro call in which it is used, the usage `%$foo' is used to define a label which is local to the context on the top of the context stack. So the `REPEAT' and `UNTIL' example given above could be implemented by means of: %macro repeat 0 %push repeat %$begin: %endmacro %macro until 1 j%-1 %$begin %pop %endmacro and invoked by means of, for example, mov cx,string repeat add cx,3 scasb until e which would scan every fourth byte of a string in search of the byte in `AL'. If you need to define, or access, labels local to the context _below_ the top one on the stack, you can use `%$$foo', or `%$$$foo' for the context below that, and so on. 4.8. Standard Macros ==================== NASM defines a set of standard macros, which are already defined when it starts to process any source file. If you really need a program to be assembled with no pre-defined macros, you can use the `%clear' directive to empty the preprocessor of everything. Most user-level assembler directives (see *Note Chapter 5::) are implemented as macros which invoke primitive directives; these are described in *Note Chapter 5::. The rest of the standard macro set is described here. 4.8.1. `__NASM_MAJOR__', `__NASM_MINOR__', `__NASM_SUBMINOR__' and `___NASM_PATCHLEVEL_ The single-line macros `__NASM_MAJOR__', `__NASM_MINOR__', `__NASM_SUBMINOR__' and `___NASM_PATCHLEVEL__' expand to the major, minor, subminor and patch level parts of the version number of NASM being used. So, under NASM 0.98.32p1 for example, `__NASM_MAJOR__' would be defined to be 0, `__NASM_MINOR__' would be defined as 98, `__NASM_SUBMINOR__' would be defined to 32, and `___NASM_PATCHLEVEL__' would be defined as 1. 4.8.7. `ALIGN' and `ALIGNB': Data Alignment ------------------------------------------- The `ALIGN' and `ALIGNB' macros provides a convenient way to align code or data on a word, longword, paragraph or other boundary. (Some assemblers call this directive `EVEN'.) The syntax of the `ALIGN' and `ALIGNB' macros is align 4 ; align on 4-byte boundary align 16 ; align on 16-byte boundary align 8,db 0 ; pad with 0s rather than NOPs align 4,resb 1 ; align to 4 in the BSS alignb 4 ; equivalent to previous line Both macros require their first argument to be a power of two; they both compute the number of additional bytes required to bring the length of the current section up to a multiple of that power of two, and then apply the `TIMES' prefix to their second argument to perform the alignment. If the second argument is not specified, the default for `ALIGN' is `NOP', and the default for `ALIGNB' is `RESB 1'. So if the second argument is specified, the two macros are equivalent. Normally, you can just use `ALIGN' in code and data sections and `ALIGNB' in BSS sections, and never need the second argument except for special purposes. `ALIGN' and `ALIGNB', being simple macros, perform no error checking: they cannot warn you if their first argument fails to be a power of two, or if their second argument generates more than one byte of code. In each of these cases they will silently do the wrong thing. `ALIGNB' (or `ALIGN' with a second argument of `RESB 1') can be used within structure definitions: struc mytype2 mt_byte: resb 1 alignb 2 mt_word: resw 1 alignb 4 mt_long: resd 1 mt_str: resb 32 endstruc This will ensure that the structure members are sensibly aligned relative to the base of the structure. A final caveat: `ALIGN' and `ALIGNB' work relative to the beginning of the _section_, not the beginning of the address space in the final executable. Aligning to a 16-byte boundary when the section you're in is only guaranteed to be aligned to a 4-byte boundary, for example, is a waste of effort. Again, NASM does not check that the section's alignment characteristics are sensible for the use of `ALIGN' or `ALIGNB'. 4.10. Other Preprocessor Directives =================================== NASM also has preprocessor directives which allow access to information from external sources. Currently they include: The following preprocessor directive is supported to allow NASM to correctly handle output of the cpp C language preprocessor. * `%line' enables NAsM to correctly handle the output of the cpp C language preprocessor (see *Note Section 4.10.1::). * `%!' enables NASM to read in the value of an environment variable, which can then be used in your program (see *Note Section 4.10.2::). Chapter 5: Assembler Directives ******************************* NASM, though it attempts to avoid the bureaucracy of assemblers like MASM and TASM, is nevertheless forced to support a _few_ directives. These are described in this chapter. NASM's directives come in two types: _user-level_ directives and _primitive_ directives. Typically, each directive has a user-level form and a primitive form. In almost all cases, we recommend that users use the user-level forms of the directives, which are implemented as macros which call the primitive forms. Primitive directives are enclosed in square brackets; user-level directives are not. In addition to the universal directives described in this chapter, each object file format can optionally supply extra directives in order to control particular features of that file format. These _format-specific_ directives are documented along with the formats that implement them, in *Note Chapter 6::. 5.1. `BITS': Specifying Target Processor Mode ============================================= The `BITS' directive specifies whether NASM should generate code designed to run on a processor operating in 16-bit mode, or code designed to run on a processor operating in 32-bit mode. The syntax is `BITS 16' or `BITS 32'. In most cases, you should not need to use `BITS' explicitly. The `aout', `coff', `elf' and `win32' object formats, which are designed for use in 32-bit operating systems, all cause NASM to select 32-bit mode by default. The `obj' object format allows you to specify each segment you define as either `USE16' or `USE32', and NASM will set its operating mode accordingly, so the use of the `BITS' directive is once again unnecessary. The most likely reason for using the `BITS' directive is to write 32- bit code in a flat binary file; this is because the `bin' output format defaults to 16-bit mode in anticipation of it being used most frequently to write DOS `.COM' programs, DOS `.SYS' device drivers and boot loader software. You do _not_ need to specify `BITS 32' merely in order to use 32- bit instructions in a 16-bit DOS program; if you do, the assembler will generate incorrect code because it will be writing code targeted at a 32- bit platform, to be run on a 16-bit one. When NASM is in `BITS 16' state, instructions which use 32-bit data are prefixed with an 0x66 byte, and those referring to 32-bit addresses have an 0x67 prefix. In `BITS 32' state, the reverse is true: 32-bit instructions require no prefixes, whereas instructions using 16-bit data need an 0x66 and those working on 16-bit addresses need an 0x67. The `BITS' directive has an exactly equivalent primitive form, `[BITS 16]' and `[BITS 32]'. The user-level form is a macro which has no function other than to call the primitive form. Note that the space is neccessary, `BITS32' will _not_ work! 5.2. `SECTION' or `SEGMENT': Changing and Defining Sections =========================================================== The `SECTION' directive (`SEGMENT' is an exactly equivalent synonym) changes which section of the output file the code you write will be assembled into. In some object file formats, the number and names of sections are fixed; in others, the user may make up as many as they wish. Hence `SECTION' may sometimes give an error message, or may define a new section, if you try to switch to a section that does not (yet) exist. The Unix object formats, and the `bin' object format (but see *Note Section 6.1.3::, all support the standardised section names `.text', `.data' and `.bss' for the code, data and uninitialised-data sections. The `obj' format, by contrast, does not recognise these section names as being special, and indeed will strip off the leading period of any section name that has one. 5.4. `EXTERN': Importing Symbols from Other Modules =================================================== `EXTERN' is similar to the MASM directive `EXTRN' and the C keyword `extern': it is used to declare a symbol which is not defined anywhere in the module being assembled, but is assumed to be defined in some other module and needs to be referred to by this one. Not every object-file format can support external variables: the `bin' format cannot. The `EXTERN' directive takes as many arguments as you like. Each argument is the name of a symbol: extern _printf extern _sscanf,_fscanf Some object-file formats provide extra features to the `EXTERN' directive. In all cases, the extra features are used by suffixing a colon to the symbol name followed by object-format specific text. For example, the `obj' format allows you to declare that the default segment base of an external should be the group `dgroup' by means of the directive extern _variable:wrt dgroup The primitive form of `EXTERN' differs from the user-level form only in that it can take only one argument at a time: the support for multiple arguments is implemented at the preprocessor level. 5.5. `GLOBAL': Exporting Symbols to Other Modules ================================================= `GLOBAL' is the other end of `EXTERN': if one module declares a symbol as `EXTERN' and refers to it, then in order to prevent linker errors, some other module must actually _define_ the symbol and declare it as `GLOBAL'. Some assemblers use the name `PUBLIC' for this purpose. The `GLOBAL' directive applying to a symbol must appear _before_ the definition of the symbol. `GLOBAL' uses the same syntax as `EXTERN', except that it must refer to symbols which _are_ defined in the same module as the `GLOBAL' directive. For example: global _main _main: ; some code `GLOBAL', like `EXTERN', allows object formats to define private extensions by means of a colon. The `elf' object format, for example, lets you specify whether global data items are functions or data: global hashlookup:function, hashtable:data Like `EXTERN', the primitive form of `GLOBAL' differs from the user-level form only in that it can take only one argument at a time. 5.6. `COMMON': Defining Common Data Areas ========================================= The `COMMON' directive is used to declare _common variables_. A common variable is much like a global variable declared in the uninitialised data section, so that common intvar 4 is similar in function to global intvar section .bss intvar resd 1 The difference is that if more than one module defines the same common variable, then at link time those variables will be _merged_, and references to `intvar' in all modules will point at the same piece of memory. Like `GLOBAL' and `EXTERN', `COMMON' supports object-format specific extensions. For example, the `obj' format allows common variables to be NEAR or FAR, and the `elf' format allows you to specify the alignment requirements of a common variable: common commvar 4:near ; works in OBJ common intarray 100:4 ; works in ELF: 4 byte aligned Chapter 6: Output Formats ************************* NASM is a portable assembler, designed to be able to compile on any ANSI C- supporting platform and produce output to run on a variety of Intel x86 operating systems. For this reason, it has a large number of available output formats, selected using the `-f' option on the NASM command line. Each of these formats, along with its extensions to the base NASM syntax, is detailed in this chapter. As stated in *Note Section 2.1.1::, NASM chooses a default name for your output file based on the input file name and the chosen output format. This will be generated by removing the extension (`.asm', `.s', or whatever you like to use) from the input file name, and substituting an extension defined by the output format. The extensions are given with each format below. 6.1. `bin': Flat-Form Binary Output =================================== The `bin' format does not produce object files: it generates nothing in the output file except the code you wrote. Such `pure binary' files are used by MS-DOS: `.COM' executables and `.SYS' device drivers are pure binary files. Pure binary output is also useful for operating system and boot loader development. The `bin' format supports multiple section names. For details of how nasm handles sections in the `bin' format, see *Note Section 6.1.3::. Using the `bin' format puts NASM by default into 16-bit mode (see *Note Section 5.1::). In order to use `bin' to write 32-bit code such as an OS kernel, you need to explicitly issue the `BITS 32' directive. `bin' has no default output file name extension: instead, it leaves your file name as it is once the original extension has been removed. Thus, the default is for NASM to assemble `binprog.asm' into a binary file called `binprog'. 6.1.1. `ORG': Binary File Program Origin ---------------------------------------- The `bin' format provides an additional directive to the list given in *Note Chapter 5::: `ORG'. The function of the `ORG' directive is to specify the origin address which NASM will assume the program begins at when it is loaded into memory. For example, the following code will generate the longword `0x00000104': org 0x100 dd label label: Unlike the `ORG' directive provided by MASM-compatible assemblers, which allows you to jump around in the object file and overwrite code you have already generated, NASM's `ORG' does exactly what the directive says: _origin_. Its sole function is to specify one offset which is added to all internal address references within the section; it does not permit any of the trickery that MASM's version does. See *Note Section 10.1.3:: for further comments. 6.5. `elf': Executable and Linkable Format Object Files ======================================================= The `elf' output format generates `ELF32' (Executable and Linkable Format) object files, as used by Linux as well as Unix System V, including Solaris x86, UnixWare and SCO Unix. `elf' provides a default output file-name extension of `.o'. 6.5.3. `elf' Extensions to the `GLOBAL' Directive ------------------------------------------------- `ELF' object files can contain more information about a global symbol than just its address: they can contain the size of the symbol and its type as well. These are not merely debugger conveniences, but are actually necessary when the program being written is a shared library. NASM therefore supports some extensions to the `GLOBAL' directive, allowing you to specify these features. You can specify whether a global variable is a function or a data object by suffixing the name with a colon and the word `function' or `data'. (`object' is a synonym for `data'.) For example: global hashlookup:function, hashtable:data exports the global symbol `hashlookup' as a function and `hashtable' as a data object. You can also specify the size of the data associated with the symbol, as a numeric expression (which may involve labels, and even forward references) after the type specifier. Like this: global hashtable:data (hashtable.end - hashtable) hashtable: db this,that,theother ; some data here .end: 6.8. `as86': Minix/Linux `as86' Object Files ============================================ The Minix/Linux 16-bit assembler `as86' has its own non-standard object file format. Although its companion linker `ld86' produces something close to ordinary `a.out' binaries as output, the object file format used to communicate between `as86' and `ld86' is not itself `a.out'. NASM supports this format, just in case it is useful, as `as86'. `as86' provides a default output file-name extension of `.o'. `as86' is a very simple object format (from the NASM user's point of view). It supports no special directives, no special symbols, no use of `SEG' or `WRT', and no extensions to any standard directives. It supports only the three standard section names `.text', `.data' and `.bss'. 8.2. Writing NetBSD/FreeBSD/OpenBSD and Linux/ELF Shared Libraries ================================================================== `ELF' replaced the older `a.out' object file format under Linux because it contains support for position-independent code (PIC), which makes writing shared libraries much easier. NASM supports the `ELF' position-independent code features, so you can write Linux `ELF' shared libraries in NASM. NetBSD, and its close cousins FreeBSD and OpenBSD, take a different approach by hacking PIC support into the `a.out' format. NASM supports this as the `aoutb' output format, so you can write BSD shared libraries in NASM too. The operating system loads a PIC shared library by memory-mapping the library file at an arbitrarily chosen point in the address space of the running process. The contents of the library's code section must therefore not depend on where it is loaded in memory. Therefore, you cannot get at your variables by writing code like this: mov eax,[myvar] ; WRONG Instead, the linker provides an area of memory called the _global offset table_, or GOT; the GOT is situated at a constant distance from your library's code, so if you can find out where your library is loaded (which is typically done using a `CALL' and `POP' combination), you can obtain the address of the GOT, and you can then load the addresses of your variables out of linker-generated entries in the GOT. The _data_ section of a PIC shared library does not have these restrictions: since the data section is writable, it has to be copied into memory anyway rather than just paged in from the library file, so as long as it's being copied it can be relocated too. So you can put ordinary types of relocation in the data section without too much worry (but see *Note Section 8.2.4:: for a caveat). 8.2.1. Obtaining the Address of the GOT --------------------------------------- Each code module in your shared library should define the GOT as an external symbol: extern _GLOBAL_OFFSET_TABLE_ ; in ELF extern __GLOBAL_OFFSET_TABLE_ ; in BSD a.out At the beginning of any function in your shared library which plans to access your data or BSS sections, you must first calculate the address of the GOT. This is typically done by writing the function in this form: func: push ebp mov ebp,esp push ebx call .get_GOT .get_GOT: pop ebx add ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc ; the function body comes here mov ebx,[ebp-4] mov esp,ebp pop ebp ret (For BSD, again, the symbol `_GLOBAL_OFFSET_TABLE' requires a second leading underscore.) The first two lines of this function are simply the standard C prologue to set up a stack frame, and the last three lines are standard C function epilogue. The third line, and the fourth to last line, save and restore the `EBX' register, because PIC shared libraries use this register to store the address of the GOT. The interesting bit is the `CALL' instruction and the following two lines. The `CALL' and `POP' combination obtains the address of the label `.get_GOT', without having to know in advance where the program was loaded (since the `CALL' instruction is encoded relative to the current position). The `ADD' instruction makes use of one of the special PIC relocation types: GOTPC relocation. With the `WRT ..gotpc' qualifier specified, the symbol referenced (here `_GLOBAL_OFFSET_TABLE_', the special symbol assigned to the GOT) is given as an offset from the beginning of the section. (Actually, `ELF' encodes it as the offset from the operand field of the `ADD' instruction, but NASM simplifies this deliberately, so you do things the same way for both `ELF' and `BSD'.) So the instruction then _adds_ the beginning of the section, to get the real address of the GOT, and subtracts the value of `.get_GOT' which it knows is in `EBX'. Therefore, by the time that instruction has finished, `EBX' contains the address of the GOT. If you didn't follow that, don't worry: it's never necessary to obtain the address of the GOT by any other means, so you can put those three instructions into a macro and safely ignore them: %macro get_GOT 0 call %%getgot %%getgot: pop ebx add ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc %endmacro 8.2.4. Exporting Symbols to the Library User -------------------------------------------- If you want to export symbols to the user of the library, you have to declare whether they are functions or data, and if they are data, you have to give the size of the data item. This is because the dynamic linker has to build procedure linkage table entries for any exported functions, and also moves exported data items away from the library's data section in which they were declared. So to export a function to users of the library, you must use global func:function ; declare it as a function func: push ebp ; etc. And to export a data item such as an array, you would have to code global array:data array.end-array ; give the size too array: resd 128 .end: Be careful: If you export a variable to the library user, by declaring it as `GLOBAL' and supplying a size, the variable will end up living in the data section of the main program, rather than in your library's data section, where you declared it. So you will have to access your own global variable with the `..got' mechanism rather than `..gotoff', as if it were external (which, effectively, it has become). Equally, if you need to store the address of an exported global in one of your data sections, you can't do it by means of the standard sort of code: dataptr: dd global_data_item ; WRONG NASM will interpret this code as an ordinary relocation, in which `global_data_item' is merely an offset from the beginning of the `.data' section (or whatever); so this reference will end up pointing at your data section instead of at the exported global which resides elsewhere. Instead of the above code, then, you must write dataptr: dd global_data_item wrt ..sym which makes use of the special `WRT' type `..sym' to instruct NASM to search the symbol table for a particular symbol at that address, rather than just relocating by section base. Either method will work for functions: referring to one of your functions by means of funcptr: dd my_function will give the user the address of the code you wrote, whereas funcptr: dd my_function wrt .sym will give the address of the procedure linkage table for the function, which is where the calling program will _believe_ the function lives. Either address is a valid way to call the function. 10.1.1. NASM Generates Inefficient Code --------------------------------------- We sometimes get `bug' reports about NASM generating inefficient, or even `wrong', code on instructions such as `ADD ESP,8'. This is a deliberate design feature, connected to predictability of output: NASM, on seeing `ADD ESP,8', will generate the form of the instruction which leaves room for a 32-bit offset. You need to code `ADD ESP,BYTE 8' if you want the space-efficient form of the instruction. This isn't a bug, it's user error: if you prefer to have NASM produce the more efficient code automatically enable optimization with the `-On' option (see *Note Section 2.1.16::). 10.1.2. My Jumps are Out of Range --------------------------------- Similarly, people complain that when they issue conditional jumps (which are `SHORT' by default) that try to jump too far, NASM reports `short jump out of range' instead of making the jumps longer. This, again, is partly a predictability issue, but in fact has a more practical reason as well. NASM has no means of being told what type of processor the code it is generating will be run on; so it cannot decide for itself that it should generate `Jcc NEAR' type instructions, because it doesn't know that it's working for a 386 or above. Alternatively, it could replace the out-of-range short `JNE' instruction with a very short `JE' instruction that jumps over a `JMP NEAR'; this is a sensible solution for processors below a 386, but hardly efficient on processors which have good branch prediction _and_ could have used `JNE NEAR' instead. So, once again, it's up to the user, not the assembler, to decide what instructions should be generated. See *Note Section 2.1.16::. 10.1.3. `ORG' Doesn't Work -------------------------- People writing boot sector programs in the `bin' format often complain that `ORG' doesn't work the way they'd like: in order to place the `0xAA55' signature word at the end of a 512-byte boot sector, people who are used to MASM tend to code ORG 0 ; some boot sector code ORG 510 DW 0xAA55 This is not the intended use of the `ORG' directive in NASM, and will not work. The correct way to solve this problem in NASM is to use the `TIMES' directive, like this: ORG 0 ; some boot sector code TIMES 510-($-$$) DW 0xAA55 The `TIMES' directive will insert exactly enough zero bytes into the output to move the assembly point up to 510. This method also has the advantage that if you accidentally fill your boot sector too full, NASM will catch the problem at assembly time and report it, so you won't end up with a boot sector that you have to disassemble to find out what's wrong with it. Полный архив NASM info можно взять тут (180 кб) .
Оставьте свой комментарий !

Ваше имя:
Комментарий:
Оба поля являются обязательными

 Автор  Комментарий к данной статье