| 1 | dis88 | 
|---|
| 2 | Beta Release | 
|---|
| 3 | 87/09/01 | 
|---|
| 4 | --- | 
|---|
| 5 | G. M. HARDING | 
|---|
| 6 | POB 4142 | 
|---|
| 7 | Santa Clara CA  95054-0142 | 
|---|
| 8 |  | 
|---|
| 9 |  | 
|---|
| 10 | "Dis88" is a symbolic disassembler for the Intel 8088 CPU, | 
|---|
| 11 | designed to run under the PC/IX  operating  system on an IBM XT | 
|---|
| 12 | or fully-compatible clone.  Its output is in the format of, and | 
|---|
| 13 | is completely compatible with, the PC/IX assembler,  "as".  The | 
|---|
| 14 | program is copyrighted by its author, but may be copied and re- | 
|---|
| 15 | distributed freely provided that complete source code, with all | 
|---|
| 16 | copyright notices, accompanies any distribution. This provision | 
|---|
| 17 | also applies to any  modifications you may make.  You are urged | 
|---|
| 18 | to comment such changes,  giving,  as a miminum,  your name and | 
|---|
| 19 | complete address. | 
|---|
| 20 |  | 
|---|
| 21 | This release of the program is a beta release, which means | 
|---|
| 22 | that it has been  extensively,  but not  exhaustively,  tested. | 
|---|
| 23 | User comments, recommendations, and bug fixes are welcome.  The | 
|---|
| 24 | principal features of the current release are: | 
|---|
| 25 |  | 
|---|
| 26 | (a)  The ability to  disassemble  any file in PC/IX object | 
|---|
| 27 | format, making full use of symbol and relocation information if | 
|---|
| 28 | it is present,  regardless of whether the file is executable or | 
|---|
| 29 | linkable,  and regardless of whether it has continuous or split | 
|---|
| 30 | I/D space; | 
|---|
| 31 |  | 
|---|
| 32 | (b)  Automatic generation of synthetic labels when no sym- | 
|---|
| 33 | bol table is available; and | 
|---|
| 34 |  | 
|---|
| 35 | (c)  Optional  output of address and object-code  informa- | 
|---|
| 36 | tion as assembler comment text. | 
|---|
| 37 |  | 
|---|
| 38 | Limitations of the current release are: | 
|---|
| 39 |  | 
|---|
| 40 | (a)  Numeric co-processor  (i.e., 8087)  mnemonics are not | 
|---|
| 41 | supported.  Instructions  for the co-processor are disassembled | 
|---|
| 42 | as CPU escape  sequences,  or as  interrupts,  depending on how | 
|---|
| 43 | they were assembled in the first place. This limitation will be | 
|---|
| 44 | addressed in a future release. | 
|---|
| 45 |  | 
|---|
| 46 | (b)  Symbolic  references  within the object  file's  data | 
|---|
| 47 | segment are not supported. Thus, for example, if a data segment | 
|---|
| 48 | location is initialized to point to a text segment address,  no | 
|---|
| 49 | reference to a text segment symbol will be detected. This limi- | 
|---|
| 50 | tation is likely to remain in future  releases,  because object | 
|---|
| 51 | code does not, in most cases, contain sufficient information to | 
|---|
| 52 | allow meaningful interpretation of pure data.  (Note,  however, | 
|---|
| 53 | that  symbolic  references  to the data segment from within the | 
|---|
| 54 | text segment are always supported.) | 
|---|
| 55 |  | 
|---|
| 56 | As a final caveat,  be aware that the PC/IX assembler does | 
|---|
| 57 | not recognize the  "esc"  mnemonic,  even though it refers to a | 
|---|
| 58 | completely  valid CPU operation  which is documented in all the | 
|---|
| 59 | Intel literature. Thus, the corresponding opcodes (0xd8 through | 
|---|
| 60 | 0xdf) are disassembled as .byte directives. For reference, how- | 
|---|
| 61 | ever,  the syntactically-correct "esc" instruction is output as | 
|---|
| 62 | a comment. | 
|---|
| 63 |  | 
|---|
| 64 | To build the disassembler program, transfer all the source | 
|---|
| 65 | files,  together with the Makefile,  to a suitable  (preferably | 
|---|
| 66 | empty) PC/IX directory. Then, simply type "make". | 
|---|
| 67 |  | 
|---|
| 68 | To use dis88,  place it in a  directory  which  appears in | 
|---|
| 69 | your $PATH list.  It may then be invoked by name from  whatever | 
|---|
| 70 | directory you happen to be in.  As a minimum,  the program must | 
|---|
| 71 | be invoked with one command-line argument:  the name of the ob- | 
|---|
| 72 | ject file to be disassembled.  (Dis88 will complain if the file | 
|---|
| 73 | specified is not an object file.)  Optionally,  you may specify | 
|---|
| 74 | an output file; stdout is the default.  One command-line switch | 
|---|
| 75 | is available:  "-o",  which makes the program display addresses | 
|---|
| 76 | and object code along with its mnemonic disassembly. | 
|---|
| 77 |  | 
|---|
| 78 | The "-o" option is useful primarily for verifying the cor- | 
|---|
| 79 | rectness of the program's output. In particular, it may be used | 
|---|
| 80 | to check the accuracy of local  relative  jump  opcodes.  These | 
|---|
| 81 | jumps often  target  local  labels,  which are lost at assembly | 
|---|
| 82 | time;  thus,  the disassembly may contain cryptic  instructions | 
|---|
| 83 | like "jnz .+39".  As a user convenience,  all relative jump and | 
|---|
| 84 | call  opcodes are output with a comment  which  identifies  the | 
|---|
| 85 | physical target address. | 
|---|
| 86 |  | 
|---|
| 87 | By convention, the release level of the program as a whole | 
|---|
| 88 | is the SID of the file disrel.c, and this SID string appears in | 
|---|
| 89 | each disassembly.  Release 2.1 of the program is the first beta | 
|---|
| 90 | release to be distributed on Usenet. | 
|---|
| 91 |  | 
|---|
| 92 |  | 
|---|
| 93 | .TH dis88 1 LOCAL | 
|---|
| 94 | .SH "NAME" | 
|---|
| 95 | dis88 \- 8088 symbolic disassembler | 
|---|
| 96 | .SH "SYNOPSIS" | 
|---|
| 97 | \fBdis88\fP [ -o ] ifile [ ofile ] | 
|---|
| 98 | .SH "DESCRIPTION" | 
|---|
| 99 | Dis88 reads ifile, which must be in PC/IX a.out format. | 
|---|
| 100 | It interprets the binary opcodes and data locations, and | 
|---|
| 101 | writes corresponding assembler source code to stdout, or | 
|---|
| 102 | to ofile if specified.  The program's output is in the | 
|---|
| 103 | format of, and fully compatible with, the PC/IX assembler, | 
|---|
| 104 | as(1).  If a symbol table is present in ifile, labels and | 
|---|
| 105 | references will be symbolic in the output.  If the input | 
|---|
| 106 | file lacks a symbol table, the fact will be noted, and the | 
|---|
| 107 | disassembly will proceed, with the disassembler generating | 
|---|
| 108 | synthetic labels as needed.  If the input file has split | 
|---|
| 109 | I/D space, or if it is executable, the disassembler will | 
|---|
| 110 | make all necessary adjustments in address-reference calculations. | 
|---|
| 111 | .PP | 
|---|
| 112 | If the "-o" option appears, object code will be included | 
|---|
| 113 | in comments during disassembly of the text segment.  This | 
|---|
| 114 | feature is used primarily for debugging the disassembler | 
|---|
| 115 | itself, but may provide information of passing interest | 
|---|
| 116 | to users. | 
|---|
| 117 | .PP | 
|---|
| 118 | The program always outputs the current machine address | 
|---|
| 119 | before disassembling an opcode.  If a symbol table is | 
|---|
| 120 | present, this address is output as an assembler comment; | 
|---|
| 121 | otherwise, it is incorporated into the synthetic label | 
|---|
| 122 | which is generated internally.  Since relative jumps, | 
|---|
| 123 | especially short ones, may target unlabelled locations, | 
|---|
| 124 | the program always outputs the physical target address | 
|---|
| 125 | as a comment, to assist the user in following the code. | 
|---|
| 126 | .PP | 
|---|
| 127 | The text segment of an object file is always padded to | 
|---|
| 128 | an even machine address.  In addition, if the file has | 
|---|
| 129 | split I/D space, the text segment will be padded to a | 
|---|
| 130 | paragraph boundary (i.e., an address divisible by 16). | 
|---|
| 131 | As a result of this padding, the disassembler may produce | 
|---|
| 132 | a few spurious, but harmless, instructions at the | 
|---|
| 133 | end of the text segment. | 
|---|
| 134 | .PP | 
|---|
| 135 | Disassembly of the data segment is a difficult matter. | 
|---|
| 136 | The information to which initialized data refers cannot | 
|---|
| 137 | be inferred from context, except in the special case | 
|---|
| 138 | of an external data or address reference, which will be | 
|---|
| 139 | reflected in the relocation table.  Internal data and | 
|---|
| 140 | address references will already be resolved in the object file, | 
|---|
| 141 | and cannot be recreated.  Therefore, the data | 
|---|
| 142 | segment is disassembled as a byte stream, with long | 
|---|
| 143 | stretches of null data represented by an appropriate | 
|---|
| 144 | ".zerow" pseudo-op.  This limitation notwithstanding, | 
|---|
| 145 | labels (as opposed to symbolic references) are always | 
|---|
| 146 | output at appropriate points within the data segment. | 
|---|
| 147 | .PP | 
|---|
| 148 | If disassembly of the data segment is difficult, disassembly of the | 
|---|
| 149 | bss segment is quite easy, because uninitialized data is all | 
|---|
| 150 | zero by definition.  No data | 
|---|
| 151 | is output in the bss segment, but symbolic labels are | 
|---|
| 152 | output as appropriate. | 
|---|
| 153 | .PP | 
|---|
| 154 | For each opcode which takes an operand, a particular | 
|---|
| 155 | symbol type (text, data, or bss) is appropriate.  This | 
|---|
| 156 | tidy correspondence is complicated somewhat, however, | 
|---|
| 157 | by the existence of assembler symbolic constants and | 
|---|
| 158 | segment override opcodes.  Therefore, the disassembler's | 
|---|
| 159 | symbol lookup routine attempts to apply a certain amount | 
|---|
| 160 | of intelligence when it is asked to find a symbol.  If | 
|---|
| 161 | it cannot match on a symbol of the preferred type, it | 
|---|
| 162 | may return a symbol of some other type, depending on | 
|---|
| 163 | preassigned (and somewhat arbitrary) rankings within | 
|---|
| 164 | each type.  Finally, if all else fails, it returns a | 
|---|
| 165 | string containing the address sought as a hex constant; | 
|---|
| 166 | this behavior allows calling routines to use the output | 
|---|
| 167 | of the lookup function regardless of the success of its | 
|---|
| 168 | search. | 
|---|
| 169 | .PP | 
|---|
| 170 | It is worth noting, at this point, that the symbol lookup | 
|---|
| 171 | routine operates linearly, and has not been optimized in | 
|---|
| 172 | any way.  Execution time is thus likely to increase | 
|---|
| 173 | geometrically with input file size.  The disassembler is | 
|---|
| 174 | internally limited to 1500 symbol table entries and 1500 | 
|---|
| 175 | relocation table entries; while these limits are generous | 
|---|
| 176 | (/unix, itself, has fewer than 800 symbols), they are not | 
|---|
| 177 | guaranteed to be adequate in all cases.  If the symbol | 
|---|
| 178 | table or the relocation table overflows, the disassembly | 
|---|
| 179 | aborts. | 
|---|
| 180 | .PP | 
|---|
| 181 | Finally, users should be aware of a bug in the assembler, | 
|---|
| 182 | which causes it not to parse the "esc" mnemonic, even | 
|---|
| 183 | though "esc" is a completely legitimate opcode which is | 
|---|
| 184 | documented in all the Intel literature.  To accommodate | 
|---|
| 185 | this deficiency, the disassembler translates opcodes of | 
|---|
| 186 | the "esc" family to .byte directives, but notes the | 
|---|
| 187 | correct mnemonic in a comment for reference. | 
|---|
| 188 | .PP | 
|---|
| 189 | In all cases, it should be possible to submit the output | 
|---|
| 190 | of the disassembler program to the assembler, and assemble | 
|---|
| 191 | it without error.  In most cases, the resulting object | 
|---|
| 192 | code will be identical to the original; in any event, it | 
|---|
| 193 | will be functionally equivalent. | 
|---|
| 194 | .SH "SEE ALSO" | 
|---|
| 195 | adb(1), as(1), cc(1), ld(1). | 
|---|
| 196 | .br | 
|---|
| 197 | "Assembler Reference Manual" in the PC/IX Programmer's | 
|---|
| 198 | Guide. | 
|---|
| 199 | .SH "DIAGNOSTICS" | 
|---|
| 200 | "can't access input file" if the input file cannot be | 
|---|
| 201 | found, opened, or read. | 
|---|
| 202 | .sp | 
|---|
| 203 | "can't open output file" if the output file cannot be | 
|---|
| 204 | created. | 
|---|
| 205 | .sp | 
|---|
| 206 | "warning: host/cpu clash" if the program is run on a | 
|---|
| 207 | machine with a different CPU. | 
|---|
| 208 | .sp | 
|---|
| 209 | "input file not in object format" if the magic number | 
|---|
| 210 | does not correspond to that of a PC/IX object file. | 
|---|
| 211 | .sp | 
|---|
| 212 | "not an 8086/8088 object file" if the CPU ID of the | 
|---|
| 213 | file header is incorrect. | 
|---|
| 214 | .sp | 
|---|
| 215 | "reloc table overflow" if there are more than 1500 | 
|---|
| 216 | entries in the relocation table. | 
|---|
| 217 | .sp | 
|---|
| 218 | "symbol table overflow" if there are more than 1500 | 
|---|
| 219 | entries in the symbol table. | 
|---|
| 220 | .sp | 
|---|
| 221 | "lseek error" if the input file is corrupted (should | 
|---|
| 222 | never happen). | 
|---|
| 223 | .sp | 
|---|
| 224 | "warning: no symbols" if the symbol table is missing. | 
|---|
| 225 | .sp | 
|---|
| 226 | "can't reopen input file" if the input file is removed | 
|---|
| 227 | or altered during program execution (should never happen). | 
|---|
| 228 | .SH "BUGS" | 
|---|
| 229 | Numeric co-processor (i.e., 8087) mnemonics are not currently supported. | 
|---|
| 230 | Instructions for the co-processor are | 
|---|
| 231 | disassembled as CPU escape sequences, or as interrupts, | 
|---|
| 232 | depending on how they were assembled in the first place. | 
|---|
| 233 | .sp | 
|---|
| 234 | Despite the program's best efforts, a symbol retrieved | 
|---|
| 235 | from the symbol table may sometimes be different from | 
|---|
| 236 | the symbol used in the original assembly. | 
|---|
| 237 | .sp | 
|---|
| 238 | The disassembler's internal tables are of fixed size, | 
|---|
| 239 | and the program aborts if they overflow. | 
|---|