[9] | 1 | dis88
|
---|
| 2 | Beta Release
|
---|
| 3 | 87/09/01
|
---|
| 4 | ---
|
---|
| 5 | G. M. HARDING
|
---|
| 6 | POB 4142
|
---|
| 7 | Santa Clara CA 95054-0142
|
---|
| 8 |
|
---|
| 9 |
|
---|
| 10 | "Dis88" is a symbolic disassembler for the Intel 8088 CPU,
|
---|
| 11 | designed to run under the PC/IX operating system on an IBM XT
|
---|
| 12 | or fully-compatible clone. Its output is in the format of, and
|
---|
| 13 | is completely compatible with, the PC/IX assembler, "as". The
|
---|
| 14 | program is copyrighted by its author, but may be copied and re-
|
---|
| 15 | distributed freely provided that complete source code, with all
|
---|
| 16 | copyright notices, accompanies any distribution. This provision
|
---|
| 17 | also applies to any modifications you may make. You are urged
|
---|
| 18 | to comment such changes, giving, as a miminum, your name and
|
---|
| 19 | complete address.
|
---|
| 20 |
|
---|
| 21 | This release of the program is a beta release, which means
|
---|
| 22 | that it has been extensively, but not exhaustively, tested.
|
---|
| 23 | User comments, recommendations, and bug fixes are welcome. The
|
---|
| 24 | principal features of the current release are:
|
---|
| 25 |
|
---|
| 26 | (a) The ability to disassemble any file in PC/IX object
|
---|
| 27 | format, making full use of symbol and relocation information if
|
---|
| 28 | it is present, regardless of whether the file is executable or
|
---|
| 29 | linkable, and regardless of whether it has continuous or split
|
---|
| 30 | I/D space;
|
---|
| 31 |
|
---|
| 32 | (b) Automatic generation of synthetic labels when no sym-
|
---|
| 33 | bol table is available; and
|
---|
| 34 |
|
---|
| 35 | (c) Optional output of address and object-code informa-
|
---|
| 36 | tion as assembler comment text.
|
---|
| 37 |
|
---|
| 38 | Limitations of the current release are:
|
---|
| 39 |
|
---|
| 40 | (a) Numeric co-processor (i.e., 8087) mnemonics are not
|
---|
| 41 | supported. Instructions for the co-processor are disassembled
|
---|
| 42 | as CPU escape sequences, or as interrupts, depending on how
|
---|
| 43 | they were assembled in the first place. This limitation will be
|
---|
| 44 | addressed in a future release.
|
---|
| 45 |
|
---|
| 46 | (b) Symbolic references within the object file's data
|
---|
| 47 | segment are not supported. Thus, for example, if a data segment
|
---|
| 48 | location is initialized to point to a text segment address, no
|
---|
| 49 | reference to a text segment symbol will be detected. This limi-
|
---|
| 50 | tation is likely to remain in future releases, because object
|
---|
| 51 | code does not, in most cases, contain sufficient information to
|
---|
| 52 | allow meaningful interpretation of pure data. (Note, however,
|
---|
| 53 | that symbolic references to the data segment from within the
|
---|
| 54 | text segment are always supported.)
|
---|
| 55 |
|
---|
| 56 | As a final caveat, be aware that the PC/IX assembler does
|
---|
| 57 | not recognize the "esc" mnemonic, even though it refers to a
|
---|
| 58 | completely valid CPU operation which is documented in all the
|
---|
| 59 | Intel literature. Thus, the corresponding opcodes (0xd8 through
|
---|
| 60 | 0xdf) are disassembled as .byte directives. For reference, how-
|
---|
| 61 | ever, the syntactically-correct "esc" instruction is output as
|
---|
| 62 | a comment.
|
---|
| 63 |
|
---|
| 64 | To build the disassembler program, transfer all the source
|
---|
| 65 | files, together with the Makefile, to a suitable (preferably
|
---|
| 66 | empty) PC/IX directory. Then, simply type "make".
|
---|
| 67 |
|
---|
| 68 | To use dis88, place it in a directory which appears in
|
---|
| 69 | your $PATH list. It may then be invoked by name from whatever
|
---|
| 70 | directory you happen to be in. As a minimum, the program must
|
---|
| 71 | be invoked with one command-line argument: the name of the ob-
|
---|
| 72 | ject file to be disassembled. (Dis88 will complain if the file
|
---|
| 73 | specified is not an object file.) Optionally, you may specify
|
---|
| 74 | an output file; stdout is the default. One command-line switch
|
---|
| 75 | is available: "-o", which makes the program display addresses
|
---|
| 76 | and object code along with its mnemonic disassembly.
|
---|
| 77 |
|
---|
| 78 | The "-o" option is useful primarily for verifying the cor-
|
---|
| 79 | rectness of the program's output. In particular, it may be used
|
---|
| 80 | to check the accuracy of local relative jump opcodes. These
|
---|
| 81 | jumps often target local labels, which are lost at assembly
|
---|
| 82 | time; thus, the disassembly may contain cryptic instructions
|
---|
| 83 | like "jnz .+39". As a user convenience, all relative jump and
|
---|
| 84 | call opcodes are output with a comment which identifies the
|
---|
| 85 | physical target address.
|
---|
| 86 |
|
---|
| 87 | By convention, the release level of the program as a whole
|
---|
| 88 | is the SID of the file disrel.c, and this SID string appears in
|
---|
| 89 | each disassembly. Release 2.1 of the program is the first beta
|
---|
| 90 | release to be distributed on Usenet.
|
---|
| 91 |
|
---|
| 92 |
|
---|
| 93 | .TH dis88 1 LOCAL
|
---|
| 94 | .SH "NAME"
|
---|
| 95 | dis88 \- 8088 symbolic disassembler
|
---|
| 96 | .SH "SYNOPSIS"
|
---|
| 97 | \fBdis88\fP [ -o ] ifile [ ofile ]
|
---|
| 98 | .SH "DESCRIPTION"
|
---|
| 99 | Dis88 reads ifile, which must be in PC/IX a.out format.
|
---|
| 100 | It interprets the binary opcodes and data locations, and
|
---|
| 101 | writes corresponding assembler source code to stdout, or
|
---|
| 102 | to ofile if specified. The program's output is in the
|
---|
| 103 | format of, and fully compatible with, the PC/IX assembler,
|
---|
| 104 | as(1). If a symbol table is present in ifile, labels and
|
---|
| 105 | references will be symbolic in the output. If the input
|
---|
| 106 | file lacks a symbol table, the fact will be noted, and the
|
---|
| 107 | disassembly will proceed, with the disassembler generating
|
---|
| 108 | synthetic labels as needed. If the input file has split
|
---|
| 109 | I/D space, or if it is executable, the disassembler will
|
---|
| 110 | make all necessary adjustments in address-reference calculations.
|
---|
| 111 | .PP
|
---|
| 112 | If the "-o" option appears, object code will be included
|
---|
| 113 | in comments during disassembly of the text segment. This
|
---|
| 114 | feature is used primarily for debugging the disassembler
|
---|
| 115 | itself, but may provide information of passing interest
|
---|
| 116 | to users.
|
---|
| 117 | .PP
|
---|
| 118 | The program always outputs the current machine address
|
---|
| 119 | before disassembling an opcode. If a symbol table is
|
---|
| 120 | present, this address is output as an assembler comment;
|
---|
| 121 | otherwise, it is incorporated into the synthetic label
|
---|
| 122 | which is generated internally. Since relative jumps,
|
---|
| 123 | especially short ones, may target unlabelled locations,
|
---|
| 124 | the program always outputs the physical target address
|
---|
| 125 | as a comment, to assist the user in following the code.
|
---|
| 126 | .PP
|
---|
| 127 | The text segment of an object file is always padded to
|
---|
| 128 | an even machine address. In addition, if the file has
|
---|
| 129 | split I/D space, the text segment will be padded to a
|
---|
| 130 | paragraph boundary (i.e., an address divisible by 16).
|
---|
| 131 | As a result of this padding, the disassembler may produce
|
---|
| 132 | a few spurious, but harmless, instructions at the
|
---|
| 133 | end of the text segment.
|
---|
| 134 | .PP
|
---|
| 135 | Disassembly of the data segment is a difficult matter.
|
---|
| 136 | The information to which initialized data refers cannot
|
---|
| 137 | be inferred from context, except in the special case
|
---|
| 138 | of an external data or address reference, which will be
|
---|
| 139 | reflected in the relocation table. Internal data and
|
---|
| 140 | address references will already be resolved in the object file,
|
---|
| 141 | and cannot be recreated. Therefore, the data
|
---|
| 142 | segment is disassembled as a byte stream, with long
|
---|
| 143 | stretches of null data represented by an appropriate
|
---|
| 144 | ".zerow" pseudo-op. This limitation notwithstanding,
|
---|
| 145 | labels (as opposed to symbolic references) are always
|
---|
| 146 | output at appropriate points within the data segment.
|
---|
| 147 | .PP
|
---|
| 148 | If disassembly of the data segment is difficult, disassembly of the
|
---|
| 149 | bss segment is quite easy, because uninitialized data is all
|
---|
| 150 | zero by definition. No data
|
---|
| 151 | is output in the bss segment, but symbolic labels are
|
---|
| 152 | output as appropriate.
|
---|
| 153 | .PP
|
---|
| 154 | For each opcode which takes an operand, a particular
|
---|
| 155 | symbol type (text, data, or bss) is appropriate. This
|
---|
| 156 | tidy correspondence is complicated somewhat, however,
|
---|
| 157 | by the existence of assembler symbolic constants and
|
---|
| 158 | segment override opcodes. Therefore, the disassembler's
|
---|
| 159 | symbol lookup routine attempts to apply a certain amount
|
---|
| 160 | of intelligence when it is asked to find a symbol. If
|
---|
| 161 | it cannot match on a symbol of the preferred type, it
|
---|
| 162 | may return a symbol of some other type, depending on
|
---|
| 163 | preassigned (and somewhat arbitrary) rankings within
|
---|
| 164 | each type. Finally, if all else fails, it returns a
|
---|
| 165 | string containing the address sought as a hex constant;
|
---|
| 166 | this behavior allows calling routines to use the output
|
---|
| 167 | of the lookup function regardless of the success of its
|
---|
| 168 | search.
|
---|
| 169 | .PP
|
---|
| 170 | It is worth noting, at this point, that the symbol lookup
|
---|
| 171 | routine operates linearly, and has not been optimized in
|
---|
| 172 | any way. Execution time is thus likely to increase
|
---|
| 173 | geometrically with input file size. The disassembler is
|
---|
| 174 | internally limited to 1500 symbol table entries and 1500
|
---|
| 175 | relocation table entries; while these limits are generous
|
---|
| 176 | (/unix, itself, has fewer than 800 symbols), they are not
|
---|
| 177 | guaranteed to be adequate in all cases. If the symbol
|
---|
| 178 | table or the relocation table overflows, the disassembly
|
---|
| 179 | aborts.
|
---|
| 180 | .PP
|
---|
| 181 | Finally, users should be aware of a bug in the assembler,
|
---|
| 182 | which causes it not to parse the "esc" mnemonic, even
|
---|
| 183 | though "esc" is a completely legitimate opcode which is
|
---|
| 184 | documented in all the Intel literature. To accommodate
|
---|
| 185 | this deficiency, the disassembler translates opcodes of
|
---|
| 186 | the "esc" family to .byte directives, but notes the
|
---|
| 187 | correct mnemonic in a comment for reference.
|
---|
| 188 | .PP
|
---|
| 189 | In all cases, it should be possible to submit the output
|
---|
| 190 | of the disassembler program to the assembler, and assemble
|
---|
| 191 | it without error. In most cases, the resulting object
|
---|
| 192 | code will be identical to the original; in any event, it
|
---|
| 193 | will be functionally equivalent.
|
---|
| 194 | .SH "SEE ALSO"
|
---|
| 195 | adb(1), as(1), cc(1), ld(1).
|
---|
| 196 | .br
|
---|
| 197 | "Assembler Reference Manual" in the PC/IX Programmer's
|
---|
| 198 | Guide.
|
---|
| 199 | .SH "DIAGNOSTICS"
|
---|
| 200 | "can't access input file" if the input file cannot be
|
---|
| 201 | found, opened, or read.
|
---|
| 202 | .sp
|
---|
| 203 | "can't open output file" if the output file cannot be
|
---|
| 204 | created.
|
---|
| 205 | .sp
|
---|
| 206 | "warning: host/cpu clash" if the program is run on a
|
---|
| 207 | machine with a different CPU.
|
---|
| 208 | .sp
|
---|
| 209 | "input file not in object format" if the magic number
|
---|
| 210 | does not correspond to that of a PC/IX object file.
|
---|
| 211 | .sp
|
---|
| 212 | "not an 8086/8088 object file" if the CPU ID of the
|
---|
| 213 | file header is incorrect.
|
---|
| 214 | .sp
|
---|
| 215 | "reloc table overflow" if there are more than 1500
|
---|
| 216 | entries in the relocation table.
|
---|
| 217 | .sp
|
---|
| 218 | "symbol table overflow" if there are more than 1500
|
---|
| 219 | entries in the symbol table.
|
---|
| 220 | .sp
|
---|
| 221 | "lseek error" if the input file is corrupted (should
|
---|
| 222 | never happen).
|
---|
| 223 | .sp
|
---|
| 224 | "warning: no symbols" if the symbol table is missing.
|
---|
| 225 | .sp
|
---|
| 226 | "can't reopen input file" if the input file is removed
|
---|
| 227 | or altered during program execution (should never happen).
|
---|
| 228 | .SH "BUGS"
|
---|
| 229 | Numeric co-processor (i.e., 8087) mnemonics are not currently supported.
|
---|
| 230 | Instructions for the co-processor are
|
---|
| 231 | disassembled as CPU escape sequences, or as interrupts,
|
---|
| 232 | depending on how they were assembled in the first place.
|
---|
| 233 | .sp
|
---|
| 234 | Despite the program's best efforts, a symbol retrieved
|
---|
| 235 | from the symbol table may sometimes be different from
|
---|
| 236 | the symbol used in the original assembly.
|
---|
| 237 | .sp
|
---|
| 238 | The disassembler's internal tables are of fixed size,
|
---|
| 239 | and the program aborts if they overflow.
|
---|