| [9] | 1 | dis88 | 
|---|
|  | 2 | Beta Release | 
|---|
|  | 3 | 87/09/01 | 
|---|
|  | 4 | --- | 
|---|
|  | 5 | G. M. HARDING | 
|---|
|  | 6 | POB 4142 | 
|---|
|  | 7 | Santa Clara CA  95054-0142 | 
|---|
|  | 8 |  | 
|---|
|  | 9 |  | 
|---|
|  | 10 | "Dis88" is a symbolic disassembler for the Intel 8088 CPU, | 
|---|
|  | 11 | designed to run under the PC/IX  operating  system on an IBM XT | 
|---|
|  | 12 | or fully-compatible clone.  Its output is in the format of, and | 
|---|
|  | 13 | is completely compatible with, the PC/IX assembler,  "as".  The | 
|---|
|  | 14 | program is copyrighted by its author, but may be copied and re- | 
|---|
|  | 15 | distributed freely provided that complete source code, with all | 
|---|
|  | 16 | copyright notices, accompanies any distribution. This provision | 
|---|
|  | 17 | also applies to any  modifications you may make.  You are urged | 
|---|
|  | 18 | to comment such changes,  giving,  as a miminum,  your name and | 
|---|
|  | 19 | complete address. | 
|---|
|  | 20 |  | 
|---|
|  | 21 | This release of the program is a beta release, which means | 
|---|
|  | 22 | that it has been  extensively,  but not  exhaustively,  tested. | 
|---|
|  | 23 | User comments, recommendations, and bug fixes are welcome.  The | 
|---|
|  | 24 | principal features of the current release are: | 
|---|
|  | 25 |  | 
|---|
|  | 26 | (a)  The ability to  disassemble  any file in PC/IX object | 
|---|
|  | 27 | format, making full use of symbol and relocation information if | 
|---|
|  | 28 | it is present,  regardless of whether the file is executable or | 
|---|
|  | 29 | linkable,  and regardless of whether it has continuous or split | 
|---|
|  | 30 | I/D space; | 
|---|
|  | 31 |  | 
|---|
|  | 32 | (b)  Automatic generation of synthetic labels when no sym- | 
|---|
|  | 33 | bol table is available; and | 
|---|
|  | 34 |  | 
|---|
|  | 35 | (c)  Optional  output of address and object-code  informa- | 
|---|
|  | 36 | tion as assembler comment text. | 
|---|
|  | 37 |  | 
|---|
|  | 38 | Limitations of the current release are: | 
|---|
|  | 39 |  | 
|---|
|  | 40 | (a)  Numeric co-processor  (i.e., 8087)  mnemonics are not | 
|---|
|  | 41 | supported.  Instructions  for the co-processor are disassembled | 
|---|
|  | 42 | as CPU escape  sequences,  or as  interrupts,  depending on how | 
|---|
|  | 43 | they were assembled in the first place. This limitation will be | 
|---|
|  | 44 | addressed in a future release. | 
|---|
|  | 45 |  | 
|---|
|  | 46 | (b)  Symbolic  references  within the object  file's  data | 
|---|
|  | 47 | segment are not supported. Thus, for example, if a data segment | 
|---|
|  | 48 | location is initialized to point to a text segment address,  no | 
|---|
|  | 49 | reference to a text segment symbol will be detected. This limi- | 
|---|
|  | 50 | tation is likely to remain in future  releases,  because object | 
|---|
|  | 51 | code does not, in most cases, contain sufficient information to | 
|---|
|  | 52 | allow meaningful interpretation of pure data.  (Note,  however, | 
|---|
|  | 53 | that  symbolic  references  to the data segment from within the | 
|---|
|  | 54 | text segment are always supported.) | 
|---|
|  | 55 |  | 
|---|
|  | 56 | As a final caveat,  be aware that the PC/IX assembler does | 
|---|
|  | 57 | not recognize the  "esc"  mnemonic,  even though it refers to a | 
|---|
|  | 58 | completely  valid CPU operation  which is documented in all the | 
|---|
|  | 59 | Intel literature. Thus, the corresponding opcodes (0xd8 through | 
|---|
|  | 60 | 0xdf) are disassembled as .byte directives. For reference, how- | 
|---|
|  | 61 | ever,  the syntactically-correct "esc" instruction is output as | 
|---|
|  | 62 | a comment. | 
|---|
|  | 63 |  | 
|---|
|  | 64 | To build the disassembler program, transfer all the source | 
|---|
|  | 65 | files,  together with the Makefile,  to a suitable  (preferably | 
|---|
|  | 66 | empty) PC/IX directory. Then, simply type "make". | 
|---|
|  | 67 |  | 
|---|
|  | 68 | To use dis88,  place it in a  directory  which  appears in | 
|---|
|  | 69 | your $PATH list.  It may then be invoked by name from  whatever | 
|---|
|  | 70 | directory you happen to be in.  As a minimum,  the program must | 
|---|
|  | 71 | be invoked with one command-line argument:  the name of the ob- | 
|---|
|  | 72 | ject file to be disassembled.  (Dis88 will complain if the file | 
|---|
|  | 73 | specified is not an object file.)  Optionally,  you may specify | 
|---|
|  | 74 | an output file; stdout is the default.  One command-line switch | 
|---|
|  | 75 | is available:  "-o",  which makes the program display addresses | 
|---|
|  | 76 | and object code along with its mnemonic disassembly. | 
|---|
|  | 77 |  | 
|---|
|  | 78 | The "-o" option is useful primarily for verifying the cor- | 
|---|
|  | 79 | rectness of the program's output. In particular, it may be used | 
|---|
|  | 80 | to check the accuracy of local  relative  jump  opcodes.  These | 
|---|
|  | 81 | jumps often  target  local  labels,  which are lost at assembly | 
|---|
|  | 82 | time;  thus,  the disassembly may contain cryptic  instructions | 
|---|
|  | 83 | like "jnz .+39".  As a user convenience,  all relative jump and | 
|---|
|  | 84 | call  opcodes are output with a comment  which  identifies  the | 
|---|
|  | 85 | physical target address. | 
|---|
|  | 86 |  | 
|---|
|  | 87 | By convention, the release level of the program as a whole | 
|---|
|  | 88 | is the SID of the file disrel.c, and this SID string appears in | 
|---|
|  | 89 | each disassembly.  Release 2.1 of the program is the first beta | 
|---|
|  | 90 | release to be distributed on Usenet. | 
|---|
|  | 91 |  | 
|---|
|  | 92 |  | 
|---|
|  | 93 | .TH dis88 1 LOCAL | 
|---|
|  | 94 | .SH "NAME" | 
|---|
|  | 95 | dis88 \- 8088 symbolic disassembler | 
|---|
|  | 96 | .SH "SYNOPSIS" | 
|---|
|  | 97 | \fBdis88\fP [ -o ] ifile [ ofile ] | 
|---|
|  | 98 | .SH "DESCRIPTION" | 
|---|
|  | 99 | Dis88 reads ifile, which must be in PC/IX a.out format. | 
|---|
|  | 100 | It interprets the binary opcodes and data locations, and | 
|---|
|  | 101 | writes corresponding assembler source code to stdout, or | 
|---|
|  | 102 | to ofile if specified.  The program's output is in the | 
|---|
|  | 103 | format of, and fully compatible with, the PC/IX assembler, | 
|---|
|  | 104 | as(1).  If a symbol table is present in ifile, labels and | 
|---|
|  | 105 | references will be symbolic in the output.  If the input | 
|---|
|  | 106 | file lacks a symbol table, the fact will be noted, and the | 
|---|
|  | 107 | disassembly will proceed, with the disassembler generating | 
|---|
|  | 108 | synthetic labels as needed.  If the input file has split | 
|---|
|  | 109 | I/D space, or if it is executable, the disassembler will | 
|---|
|  | 110 | make all necessary adjustments in address-reference calculations. | 
|---|
|  | 111 | .PP | 
|---|
|  | 112 | If the "-o" option appears, object code will be included | 
|---|
|  | 113 | in comments during disassembly of the text segment.  This | 
|---|
|  | 114 | feature is used primarily for debugging the disassembler | 
|---|
|  | 115 | itself, but may provide information of passing interest | 
|---|
|  | 116 | to users. | 
|---|
|  | 117 | .PP | 
|---|
|  | 118 | The program always outputs the current machine address | 
|---|
|  | 119 | before disassembling an opcode.  If a symbol table is | 
|---|
|  | 120 | present, this address is output as an assembler comment; | 
|---|
|  | 121 | otherwise, it is incorporated into the synthetic label | 
|---|
|  | 122 | which is generated internally.  Since relative jumps, | 
|---|
|  | 123 | especially short ones, may target unlabelled locations, | 
|---|
|  | 124 | the program always outputs the physical target address | 
|---|
|  | 125 | as a comment, to assist the user in following the code. | 
|---|
|  | 126 | .PP | 
|---|
|  | 127 | The text segment of an object file is always padded to | 
|---|
|  | 128 | an even machine address.  In addition, if the file has | 
|---|
|  | 129 | split I/D space, the text segment will be padded to a | 
|---|
|  | 130 | paragraph boundary (i.e., an address divisible by 16). | 
|---|
|  | 131 | As a result of this padding, the disassembler may produce | 
|---|
|  | 132 | a few spurious, but harmless, instructions at the | 
|---|
|  | 133 | end of the text segment. | 
|---|
|  | 134 | .PP | 
|---|
|  | 135 | Disassembly of the data segment is a difficult matter. | 
|---|
|  | 136 | The information to which initialized data refers cannot | 
|---|
|  | 137 | be inferred from context, except in the special case | 
|---|
|  | 138 | of an external data or address reference, which will be | 
|---|
|  | 139 | reflected in the relocation table.  Internal data and | 
|---|
|  | 140 | address references will already be resolved in the object file, | 
|---|
|  | 141 | and cannot be recreated.  Therefore, the data | 
|---|
|  | 142 | segment is disassembled as a byte stream, with long | 
|---|
|  | 143 | stretches of null data represented by an appropriate | 
|---|
|  | 144 | ".zerow" pseudo-op.  This limitation notwithstanding, | 
|---|
|  | 145 | labels (as opposed to symbolic references) are always | 
|---|
|  | 146 | output at appropriate points within the data segment. | 
|---|
|  | 147 | .PP | 
|---|
|  | 148 | If disassembly of the data segment is difficult, disassembly of the | 
|---|
|  | 149 | bss segment is quite easy, because uninitialized data is all | 
|---|
|  | 150 | zero by definition.  No data | 
|---|
|  | 151 | is output in the bss segment, but symbolic labels are | 
|---|
|  | 152 | output as appropriate. | 
|---|
|  | 153 | .PP | 
|---|
|  | 154 | For each opcode which takes an operand, a particular | 
|---|
|  | 155 | symbol type (text, data, or bss) is appropriate.  This | 
|---|
|  | 156 | tidy correspondence is complicated somewhat, however, | 
|---|
|  | 157 | by the existence of assembler symbolic constants and | 
|---|
|  | 158 | segment override opcodes.  Therefore, the disassembler's | 
|---|
|  | 159 | symbol lookup routine attempts to apply a certain amount | 
|---|
|  | 160 | of intelligence when it is asked to find a symbol.  If | 
|---|
|  | 161 | it cannot match on a symbol of the preferred type, it | 
|---|
|  | 162 | may return a symbol of some other type, depending on | 
|---|
|  | 163 | preassigned (and somewhat arbitrary) rankings within | 
|---|
|  | 164 | each type.  Finally, if all else fails, it returns a | 
|---|
|  | 165 | string containing the address sought as a hex constant; | 
|---|
|  | 166 | this behavior allows calling routines to use the output | 
|---|
|  | 167 | of the lookup function regardless of the success of its | 
|---|
|  | 168 | search. | 
|---|
|  | 169 | .PP | 
|---|
|  | 170 | It is worth noting, at this point, that the symbol lookup | 
|---|
|  | 171 | routine operates linearly, and has not been optimized in | 
|---|
|  | 172 | any way.  Execution time is thus likely to increase | 
|---|
|  | 173 | geometrically with input file size.  The disassembler is | 
|---|
|  | 174 | internally limited to 1500 symbol table entries and 1500 | 
|---|
|  | 175 | relocation table entries; while these limits are generous | 
|---|
|  | 176 | (/unix, itself, has fewer than 800 symbols), they are not | 
|---|
|  | 177 | guaranteed to be adequate in all cases.  If the symbol | 
|---|
|  | 178 | table or the relocation table overflows, the disassembly | 
|---|
|  | 179 | aborts. | 
|---|
|  | 180 | .PP | 
|---|
|  | 181 | Finally, users should be aware of a bug in the assembler, | 
|---|
|  | 182 | which causes it not to parse the "esc" mnemonic, even | 
|---|
|  | 183 | though "esc" is a completely legitimate opcode which is | 
|---|
|  | 184 | documented in all the Intel literature.  To accommodate | 
|---|
|  | 185 | this deficiency, the disassembler translates opcodes of | 
|---|
|  | 186 | the "esc" family to .byte directives, but notes the | 
|---|
|  | 187 | correct mnemonic in a comment for reference. | 
|---|
|  | 188 | .PP | 
|---|
|  | 189 | In all cases, it should be possible to submit the output | 
|---|
|  | 190 | of the disassembler program to the assembler, and assemble | 
|---|
|  | 191 | it without error.  In most cases, the resulting object | 
|---|
|  | 192 | code will be identical to the original; in any event, it | 
|---|
|  | 193 | will be functionally equivalent. | 
|---|
|  | 194 | .SH "SEE ALSO" | 
|---|
|  | 195 | adb(1), as(1), cc(1), ld(1). | 
|---|
|  | 196 | .br | 
|---|
|  | 197 | "Assembler Reference Manual" in the PC/IX Programmer's | 
|---|
|  | 198 | Guide. | 
|---|
|  | 199 | .SH "DIAGNOSTICS" | 
|---|
|  | 200 | "can't access input file" if the input file cannot be | 
|---|
|  | 201 | found, opened, or read. | 
|---|
|  | 202 | .sp | 
|---|
|  | 203 | "can't open output file" if the output file cannot be | 
|---|
|  | 204 | created. | 
|---|
|  | 205 | .sp | 
|---|
|  | 206 | "warning: host/cpu clash" if the program is run on a | 
|---|
|  | 207 | machine with a different CPU. | 
|---|
|  | 208 | .sp | 
|---|
|  | 209 | "input file not in object format" if the magic number | 
|---|
|  | 210 | does not correspond to that of a PC/IX object file. | 
|---|
|  | 211 | .sp | 
|---|
|  | 212 | "not an 8086/8088 object file" if the CPU ID of the | 
|---|
|  | 213 | file header is incorrect. | 
|---|
|  | 214 | .sp | 
|---|
|  | 215 | "reloc table overflow" if there are more than 1500 | 
|---|
|  | 216 | entries in the relocation table. | 
|---|
|  | 217 | .sp | 
|---|
|  | 218 | "symbol table overflow" if there are more than 1500 | 
|---|
|  | 219 | entries in the symbol table. | 
|---|
|  | 220 | .sp | 
|---|
|  | 221 | "lseek error" if the input file is corrupted (should | 
|---|
|  | 222 | never happen). | 
|---|
|  | 223 | .sp | 
|---|
|  | 224 | "warning: no symbols" if the symbol table is missing. | 
|---|
|  | 225 | .sp | 
|---|
|  | 226 | "can't reopen input file" if the input file is removed | 
|---|
|  | 227 | or altered during program execution (should never happen). | 
|---|
|  | 228 | .SH "BUGS" | 
|---|
|  | 229 | Numeric co-processor (i.e., 8087) mnemonics are not currently supported. | 
|---|
|  | 230 | Instructions for the co-processor are | 
|---|
|  | 231 | disassembled as CPU escape sequences, or as interrupts, | 
|---|
|  | 232 | depending on how they were assembled in the first place. | 
|---|
|  | 233 | .sp | 
|---|
|  | 234 | Despite the program's best efforts, a symbol retrieved | 
|---|
|  | 235 | from the symbol table may sometimes be different from | 
|---|
|  | 236 | the symbol used in the original assembly. | 
|---|
|  | 237 | .sp | 
|---|
|  | 238 | The disassembler's internal tables are of fixed size, | 
|---|
|  | 239 | and the program aborts if they overflow. | 
|---|