| 1 |                                      dis88
 | 
|---|
| 2 |                                   Beta Release
 | 
|---|
| 3 |                                     87/09/01
 | 
|---|
| 4 |                                       ---
 | 
|---|
| 5 |                                  G. M. HARDING
 | 
|---|
| 6 |                                     POB 4142
 | 
|---|
| 7 |                            Santa Clara CA  95054-0142
 | 
|---|
| 8 | 
 | 
|---|
| 9 | 
 | 
|---|
| 10 |              "Dis88" is a symbolic disassembler for the Intel 8088 CPU,
 | 
|---|
| 11 |         designed to run under the PC/IX  operating  system on an IBM XT
 | 
|---|
| 12 |         or fully-compatible clone.  Its output is in the format of, and
 | 
|---|
| 13 |         is completely compatible with, the PC/IX assembler,  "as".  The
 | 
|---|
| 14 |         program is copyrighted by its author, but may be copied and re-
 | 
|---|
| 15 |         distributed freely provided that complete source code, with all
 | 
|---|
| 16 |         copyright notices, accompanies any distribution. This provision
 | 
|---|
| 17 |         also applies to any  modifications you may make.  You are urged
 | 
|---|
| 18 |         to comment such changes,  giving,  as a miminum,  your name and
 | 
|---|
| 19 |         complete address.
 | 
|---|
| 20 | 
 | 
|---|
| 21 |              This release of the program is a beta release, which means
 | 
|---|
| 22 |         that it has been  extensively,  but not  exhaustively,  tested.
 | 
|---|
| 23 |         User comments, recommendations, and bug fixes are welcome.  The
 | 
|---|
| 24 |         principal features of the current release are:
 | 
|---|
| 25 | 
 | 
|---|
| 26 |              (a)  The ability to  disassemble  any file in PC/IX object
 | 
|---|
| 27 |         format, making full use of symbol and relocation information if
 | 
|---|
| 28 |         it is present,  regardless of whether the file is executable or
 | 
|---|
| 29 |         linkable,  and regardless of whether it has continuous or split
 | 
|---|
| 30 |         I/D space;
 | 
|---|
| 31 | 
 | 
|---|
| 32 |              (b)  Automatic generation of synthetic labels when no sym-
 | 
|---|
| 33 |         bol table is available; and
 | 
|---|
| 34 | 
 | 
|---|
| 35 |              (c)  Optional  output of address and object-code  informa-
 | 
|---|
| 36 |         tion as assembler comment text.
 | 
|---|
| 37 | 
 | 
|---|
| 38 |              Limitations of the current release are:
 | 
|---|
| 39 | 
 | 
|---|
| 40 |              (a)  Numeric co-processor  (i.e., 8087)  mnemonics are not
 | 
|---|
| 41 |         supported.  Instructions  for the co-processor are disassembled
 | 
|---|
| 42 |         as CPU escape  sequences,  or as  interrupts,  depending on how
 | 
|---|
| 43 |         they were assembled in the first place. This limitation will be
 | 
|---|
| 44 |         addressed in a future release.
 | 
|---|
| 45 | 
 | 
|---|
| 46 |              (b)  Symbolic  references  within the object  file's  data
 | 
|---|
| 47 |         segment are not supported. Thus, for example, if a data segment
 | 
|---|
| 48 |         location is initialized to point to a text segment address,  no
 | 
|---|
| 49 |         reference to a text segment symbol will be detected. This limi-
 | 
|---|
| 50 |         tation is likely to remain in future  releases,  because object
 | 
|---|
| 51 |         code does not, in most cases, contain sufficient information to
 | 
|---|
| 52 |         allow meaningful interpretation of pure data.  (Note,  however,
 | 
|---|
| 53 |         that  symbolic  references  to the data segment from within the
 | 
|---|
| 54 |         text segment are always supported.)
 | 
|---|
| 55 | 
 | 
|---|
| 56 |              As a final caveat,  be aware that the PC/IX assembler does
 | 
|---|
| 57 |         not recognize the  "esc"  mnemonic,  even though it refers to a
 | 
|---|
| 58 |         completely  valid CPU operation  which is documented in all the
 | 
|---|
| 59 |         Intel literature. Thus, the corresponding opcodes (0xd8 through
 | 
|---|
| 60 |         0xdf) are disassembled as .byte directives. For reference, how-
 | 
|---|
| 61 |         ever,  the syntactically-correct "esc" instruction is output as
 | 
|---|
| 62 |         a comment.
 | 
|---|
| 63 | 
 | 
|---|
| 64 |              To build the disassembler program, transfer all the source
 | 
|---|
| 65 |         files,  together with the Makefile,  to a suitable  (preferably
 | 
|---|
| 66 |         empty) PC/IX directory. Then, simply type "make".
 | 
|---|
| 67 | 
 | 
|---|
| 68 |              To use dis88,  place it in a  directory  which  appears in
 | 
|---|
| 69 |         your $PATH list.  It may then be invoked by name from  whatever
 | 
|---|
| 70 |         directory you happen to be in.  As a minimum,  the program must
 | 
|---|
| 71 |         be invoked with one command-line argument:  the name of the ob-
 | 
|---|
| 72 |         ject file to be disassembled.  (Dis88 will complain if the file
 | 
|---|
| 73 |         specified is not an object file.)  Optionally,  you may specify
 | 
|---|
| 74 |         an output file; stdout is the default.  One command-line switch
 | 
|---|
| 75 |         is available:  "-o",  which makes the program display addresses
 | 
|---|
| 76 |         and object code along with its mnemonic disassembly.
 | 
|---|
| 77 | 
 | 
|---|
| 78 |              The "-o" option is useful primarily for verifying the cor-
 | 
|---|
| 79 |         rectness of the program's output. In particular, it may be used
 | 
|---|
| 80 |         to check the accuracy of local  relative  jump  opcodes.  These
 | 
|---|
| 81 |         jumps often  target  local  labels,  which are lost at assembly
 | 
|---|
| 82 |         time;  thus,  the disassembly may contain cryptic  instructions
 | 
|---|
| 83 |         like "jnz .+39".  As a user convenience,  all relative jump and
 | 
|---|
| 84 |         call  opcodes are output with a comment  which  identifies  the
 | 
|---|
| 85 |         physical target address.
 | 
|---|
| 86 | 
 | 
|---|
| 87 |              By convention, the release level of the program as a whole
 | 
|---|
| 88 |         is the SID of the file disrel.c, and this SID string appears in
 | 
|---|
| 89 |         each disassembly.  Release 2.1 of the program is the first beta
 | 
|---|
| 90 |         release to be distributed on Usenet.
 | 
|---|
| 91 | 
 | 
|---|
| 92 | 
 | 
|---|
| 93 | .TH dis88 1 LOCAL
 | 
|---|
| 94 | .SH "NAME"
 | 
|---|
| 95 | dis88 \- 8088 symbolic disassembler
 | 
|---|
| 96 | .SH "SYNOPSIS"
 | 
|---|
| 97 | \fBdis88\fP [ -o ] ifile [ ofile ]
 | 
|---|
| 98 | .SH "DESCRIPTION"
 | 
|---|
| 99 | Dis88 reads ifile, which must be in PC/IX a.out format.
 | 
|---|
| 100 | It interprets the binary opcodes and data locations, and
 | 
|---|
| 101 | writes corresponding assembler source code to stdout, or
 | 
|---|
| 102 | to ofile if specified.  The program's output is in the
 | 
|---|
| 103 | format of, and fully compatible with, the PC/IX assembler,
 | 
|---|
| 104 | as(1).  If a symbol table is present in ifile, labels and
 | 
|---|
| 105 | references will be symbolic in the output.  If the input
 | 
|---|
| 106 | file lacks a symbol table, the fact will be noted, and the
 | 
|---|
| 107 | disassembly will proceed, with the disassembler generating
 | 
|---|
| 108 | synthetic labels as needed.  If the input file has split
 | 
|---|
| 109 | I/D space, or if it is executable, the disassembler will
 | 
|---|
| 110 | make all necessary adjustments in address-reference calculations.
 | 
|---|
| 111 | .PP
 | 
|---|
| 112 | If the "-o" option appears, object code will be included
 | 
|---|
| 113 | in comments during disassembly of the text segment.  This
 | 
|---|
| 114 | feature is used primarily for debugging the disassembler
 | 
|---|
| 115 | itself, but may provide information of passing interest
 | 
|---|
| 116 | to users.
 | 
|---|
| 117 | .PP
 | 
|---|
| 118 | The program always outputs the current machine address
 | 
|---|
| 119 | before disassembling an opcode.  If a symbol table is
 | 
|---|
| 120 | present, this address is output as an assembler comment;
 | 
|---|
| 121 | otherwise, it is incorporated into the synthetic label
 | 
|---|
| 122 | which is generated internally.  Since relative jumps,
 | 
|---|
| 123 | especially short ones, may target unlabelled locations,
 | 
|---|
| 124 | the program always outputs the physical target address
 | 
|---|
| 125 | as a comment, to assist the user in following the code.
 | 
|---|
| 126 | .PP
 | 
|---|
| 127 | The text segment of an object file is always padded to
 | 
|---|
| 128 | an even machine address.  In addition, if the file has
 | 
|---|
| 129 | split I/D space, the text segment will be padded to a
 | 
|---|
| 130 | paragraph boundary (i.e., an address divisible by 16).
 | 
|---|
| 131 | As a result of this padding, the disassembler may produce
 | 
|---|
| 132 | a few spurious, but harmless, instructions at the
 | 
|---|
| 133 | end of the text segment.
 | 
|---|
| 134 | .PP
 | 
|---|
| 135 | Disassembly of the data segment is a difficult matter.
 | 
|---|
| 136 | The information to which initialized data refers cannot
 | 
|---|
| 137 | be inferred from context, except in the special case
 | 
|---|
| 138 | of an external data or address reference, which will be
 | 
|---|
| 139 | reflected in the relocation table.  Internal data and
 | 
|---|
| 140 | address references will already be resolved in the object file,
 | 
|---|
| 141 | and cannot be recreated.  Therefore, the data
 | 
|---|
| 142 | segment is disassembled as a byte stream, with long
 | 
|---|
| 143 | stretches of null data represented by an appropriate
 | 
|---|
| 144 | ".zerow" pseudo-op.  This limitation notwithstanding,
 | 
|---|
| 145 | labels (as opposed to symbolic references) are always
 | 
|---|
| 146 | output at appropriate points within the data segment.
 | 
|---|
| 147 | .PP
 | 
|---|
| 148 | If disassembly of the data segment is difficult, disassembly of the
 | 
|---|
| 149 | bss segment is quite easy, because uninitialized data is all
 | 
|---|
| 150 | zero by definition.  No data
 | 
|---|
| 151 | is output in the bss segment, but symbolic labels are
 | 
|---|
| 152 | output as appropriate.
 | 
|---|
| 153 | .PP
 | 
|---|
| 154 | For each opcode which takes an operand, a particular
 | 
|---|
| 155 | symbol type (text, data, or bss) is appropriate.  This
 | 
|---|
| 156 | tidy correspondence is complicated somewhat, however,
 | 
|---|
| 157 | by the existence of assembler symbolic constants and
 | 
|---|
| 158 | segment override opcodes.  Therefore, the disassembler's
 | 
|---|
| 159 | symbol lookup routine attempts to apply a certain amount
 | 
|---|
| 160 | of intelligence when it is asked to find a symbol.  If
 | 
|---|
| 161 | it cannot match on a symbol of the preferred type, it
 | 
|---|
| 162 | may return a symbol of some other type, depending on
 | 
|---|
| 163 | preassigned (and somewhat arbitrary) rankings within
 | 
|---|
| 164 | each type.  Finally, if all else fails, it returns a
 | 
|---|
| 165 | string containing the address sought as a hex constant;
 | 
|---|
| 166 | this behavior allows calling routines to use the output
 | 
|---|
| 167 | of the lookup function regardless of the success of its
 | 
|---|
| 168 | search.
 | 
|---|
| 169 | .PP
 | 
|---|
| 170 | It is worth noting, at this point, that the symbol lookup
 | 
|---|
| 171 | routine operates linearly, and has not been optimized in
 | 
|---|
| 172 | any way.  Execution time is thus likely to increase
 | 
|---|
| 173 | geometrically with input file size.  The disassembler is
 | 
|---|
| 174 | internally limited to 1500 symbol table entries and 1500
 | 
|---|
| 175 | relocation table entries; while these limits are generous
 | 
|---|
| 176 | (/unix, itself, has fewer than 800 symbols), they are not
 | 
|---|
| 177 | guaranteed to be adequate in all cases.  If the symbol
 | 
|---|
| 178 | table or the relocation table overflows, the disassembly
 | 
|---|
| 179 | aborts.
 | 
|---|
| 180 | .PP
 | 
|---|
| 181 | Finally, users should be aware of a bug in the assembler,
 | 
|---|
| 182 | which causes it not to parse the "esc" mnemonic, even
 | 
|---|
| 183 | though "esc" is a completely legitimate opcode which is
 | 
|---|
| 184 | documented in all the Intel literature.  To accommodate
 | 
|---|
| 185 | this deficiency, the disassembler translates opcodes of
 | 
|---|
| 186 | the "esc" family to .byte directives, but notes the
 | 
|---|
| 187 | correct mnemonic in a comment for reference.
 | 
|---|
| 188 | .PP
 | 
|---|
| 189 | In all cases, it should be possible to submit the output
 | 
|---|
| 190 | of the disassembler program to the assembler, and assemble
 | 
|---|
| 191 | it without error.  In most cases, the resulting object
 | 
|---|
| 192 | code will be identical to the original; in any event, it
 | 
|---|
| 193 | will be functionally equivalent.
 | 
|---|
| 194 | .SH "SEE ALSO"
 | 
|---|
| 195 | adb(1), as(1), cc(1), ld(1).
 | 
|---|
| 196 | .br
 | 
|---|
| 197 | "Assembler Reference Manual" in the PC/IX Programmer's
 | 
|---|
| 198 | Guide.
 | 
|---|
| 199 | .SH "DIAGNOSTICS"
 | 
|---|
| 200 | "can't access input file" if the input file cannot be
 | 
|---|
| 201 | found, opened, or read.
 | 
|---|
| 202 | .sp
 | 
|---|
| 203 | "can't open output file" if the output file cannot be
 | 
|---|
| 204 | created.
 | 
|---|
| 205 | .sp
 | 
|---|
| 206 | "warning: host/cpu clash" if the program is run on a
 | 
|---|
| 207 | machine with a different CPU.
 | 
|---|
| 208 | .sp
 | 
|---|
| 209 | "input file not in object format" if the magic number
 | 
|---|
| 210 | does not correspond to that of a PC/IX object file.
 | 
|---|
| 211 | .sp
 | 
|---|
| 212 | "not an 8086/8088 object file" if the CPU ID of the
 | 
|---|
| 213 | file header is incorrect.
 | 
|---|
| 214 | .sp
 | 
|---|
| 215 | "reloc table overflow" if there are more than 1500
 | 
|---|
| 216 | entries in the relocation table.
 | 
|---|
| 217 | .sp
 | 
|---|
| 218 | "symbol table overflow" if there are more than 1500
 | 
|---|
| 219 | entries in the symbol table.
 | 
|---|
| 220 | .sp
 | 
|---|
| 221 | "lseek error" if the input file is corrupted (should
 | 
|---|
| 222 | never happen).
 | 
|---|
| 223 | .sp
 | 
|---|
| 224 | "warning: no symbols" if the symbol table is missing.
 | 
|---|
| 225 | .sp
 | 
|---|
| 226 | "can't reopen input file" if the input file is removed
 | 
|---|
| 227 | or altered during program execution (should never happen).
 | 
|---|
| 228 | .SH "BUGS"
 | 
|---|
| 229 | Numeric co-processor (i.e., 8087) mnemonics are not currently supported.
 | 
|---|
| 230 | Instructions for the co-processor are
 | 
|---|
| 231 | disassembled as CPU escape sequences, or as interrupts,
 | 
|---|
| 232 | depending on how they were assembled in the first place.
 | 
|---|
| 233 | .sp
 | 
|---|
| 234 | Despite the program's best efforts, a symbol retrieved
 | 
|---|
| 235 | from the symbol table may sometimes be different from
 | 
|---|
| 236 | the symbol used in the original assembly.
 | 
|---|
| 237 | .sp
 | 
|---|
| 238 | The disassembler's internal tables are of fixed size,
 | 
|---|
| 239 | and the program aborts if they overflow.
 | 
|---|