1 | dis88
|
---|
2 | Beta Release
|
---|
3 | 87/09/01
|
---|
4 | ---
|
---|
5 | G. M. HARDING
|
---|
6 | POB 4142
|
---|
7 | Santa Clara CA 95054-0142
|
---|
8 |
|
---|
9 |
|
---|
10 | "Dis88" is a symbolic disassembler for the Intel 8088 CPU,
|
---|
11 | designed to run under the PC/IX operating system on an IBM XT
|
---|
12 | or fully-compatible clone. Its output is in the format of, and
|
---|
13 | is completely compatible with, the PC/IX assembler, "as". The
|
---|
14 | program is copyrighted by its author, but may be copied and re-
|
---|
15 | distributed freely provided that complete source code, with all
|
---|
16 | copyright notices, accompanies any distribution. This provision
|
---|
17 | also applies to any modifications you may make. You are urged
|
---|
18 | to comment such changes, giving, as a miminum, your name and
|
---|
19 | complete address.
|
---|
20 |
|
---|
21 | This release of the program is a beta release, which means
|
---|
22 | that it has been extensively, but not exhaustively, tested.
|
---|
23 | User comments, recommendations, and bug fixes are welcome. The
|
---|
24 | principal features of the current release are:
|
---|
25 |
|
---|
26 | (a) The ability to disassemble any file in PC/IX object
|
---|
27 | format, making full use of symbol and relocation information if
|
---|
28 | it is present, regardless of whether the file is executable or
|
---|
29 | linkable, and regardless of whether it has continuous or split
|
---|
30 | I/D space;
|
---|
31 |
|
---|
32 | (b) Automatic generation of synthetic labels when no sym-
|
---|
33 | bol table is available; and
|
---|
34 |
|
---|
35 | (c) Optional output of address and object-code informa-
|
---|
36 | tion as assembler comment text.
|
---|
37 |
|
---|
38 | Limitations of the current release are:
|
---|
39 |
|
---|
40 | (a) Numeric co-processor (i.e., 8087) mnemonics are not
|
---|
41 | supported. Instructions for the co-processor are disassembled
|
---|
42 | as CPU escape sequences, or as interrupts, depending on how
|
---|
43 | they were assembled in the first place. This limitation will be
|
---|
44 | addressed in a future release.
|
---|
45 |
|
---|
46 | (b) Symbolic references within the object file's data
|
---|
47 | segment are not supported. Thus, for example, if a data segment
|
---|
48 | location is initialized to point to a text segment address, no
|
---|
49 | reference to a text segment symbol will be detected. This limi-
|
---|
50 | tation is likely to remain in future releases, because object
|
---|
51 | code does not, in most cases, contain sufficient information to
|
---|
52 | allow meaningful interpretation of pure data. (Note, however,
|
---|
53 | that symbolic references to the data segment from within the
|
---|
54 | text segment are always supported.)
|
---|
55 |
|
---|
56 | As a final caveat, be aware that the PC/IX assembler does
|
---|
57 | not recognize the "esc" mnemonic, even though it refers to a
|
---|
58 | completely valid CPU operation which is documented in all the
|
---|
59 | Intel literature. Thus, the corresponding opcodes (0xd8 through
|
---|
60 | 0xdf) are disassembled as .byte directives. For reference, how-
|
---|
61 | ever, the syntactically-correct "esc" instruction is output as
|
---|
62 | a comment.
|
---|
63 |
|
---|
64 | To build the disassembler program, transfer all the source
|
---|
65 | files, together with the Makefile, to a suitable (preferably
|
---|
66 | empty) PC/IX directory. Then, simply type "make".
|
---|
67 |
|
---|
68 | To use dis88, place it in a directory which appears in
|
---|
69 | your $PATH list. It may then be invoked by name from whatever
|
---|
70 | directory you happen to be in. As a minimum, the program must
|
---|
71 | be invoked with one command-line argument: the name of the ob-
|
---|
72 | ject file to be disassembled. (Dis88 will complain if the file
|
---|
73 | specified is not an object file.) Optionally, you may specify
|
---|
74 | an output file; stdout is the default. One command-line switch
|
---|
75 | is available: "-o", which makes the program display addresses
|
---|
76 | and object code along with its mnemonic disassembly.
|
---|
77 |
|
---|
78 | The "-o" option is useful primarily for verifying the cor-
|
---|
79 | rectness of the program's output. In particular, it may be used
|
---|
80 | to check the accuracy of local relative jump opcodes. These
|
---|
81 | jumps often target local labels, which are lost at assembly
|
---|
82 | time; thus, the disassembly may contain cryptic instructions
|
---|
83 | like "jnz .+39". As a user convenience, all relative jump and
|
---|
84 | call opcodes are output with a comment which identifies the
|
---|
85 | physical target address.
|
---|
86 |
|
---|
87 | By convention, the release level of the program as a whole
|
---|
88 | is the SID of the file disrel.c, and this SID string appears in
|
---|
89 | each disassembly. Release 2.1 of the program is the first beta
|
---|
90 | release to be distributed on Usenet.
|
---|
91 |
|
---|
92 |
|
---|
93 | .TH dis88 1 LOCAL
|
---|
94 | .SH "NAME"
|
---|
95 | dis88 \- 8088 symbolic disassembler
|
---|
96 | .SH "SYNOPSIS"
|
---|
97 | \fBdis88\fP [ -o ] ifile [ ofile ]
|
---|
98 | .SH "DESCRIPTION"
|
---|
99 | Dis88 reads ifile, which must be in PC/IX a.out format.
|
---|
100 | It interprets the binary opcodes and data locations, and
|
---|
101 | writes corresponding assembler source code to stdout, or
|
---|
102 | to ofile if specified. The program's output is in the
|
---|
103 | format of, and fully compatible with, the PC/IX assembler,
|
---|
104 | as(1). If a symbol table is present in ifile, labels and
|
---|
105 | references will be symbolic in the output. If the input
|
---|
106 | file lacks a symbol table, the fact will be noted, and the
|
---|
107 | disassembly will proceed, with the disassembler generating
|
---|
108 | synthetic labels as needed. If the input file has split
|
---|
109 | I/D space, or if it is executable, the disassembler will
|
---|
110 | make all necessary adjustments in address-reference calculations.
|
---|
111 | .PP
|
---|
112 | If the "-o" option appears, object code will be included
|
---|
113 | in comments during disassembly of the text segment. This
|
---|
114 | feature is used primarily for debugging the disassembler
|
---|
115 | itself, but may provide information of passing interest
|
---|
116 | to users.
|
---|
117 | .PP
|
---|
118 | The program always outputs the current machine address
|
---|
119 | before disassembling an opcode. If a symbol table is
|
---|
120 | present, this address is output as an assembler comment;
|
---|
121 | otherwise, it is incorporated into the synthetic label
|
---|
122 | which is generated internally. Since relative jumps,
|
---|
123 | especially short ones, may target unlabelled locations,
|
---|
124 | the program always outputs the physical target address
|
---|
125 | as a comment, to assist the user in following the code.
|
---|
126 | .PP
|
---|
127 | The text segment of an object file is always padded to
|
---|
128 | an even machine address. In addition, if the file has
|
---|
129 | split I/D space, the text segment will be padded to a
|
---|
130 | paragraph boundary (i.e., an address divisible by 16).
|
---|
131 | As a result of this padding, the disassembler may produce
|
---|
132 | a few spurious, but harmless, instructions at the
|
---|
133 | end of the text segment.
|
---|
134 | .PP
|
---|
135 | Disassembly of the data segment is a difficult matter.
|
---|
136 | The information to which initialized data refers cannot
|
---|
137 | be inferred from context, except in the special case
|
---|
138 | of an external data or address reference, which will be
|
---|
139 | reflected in the relocation table. Internal data and
|
---|
140 | address references will already be resolved in the object file,
|
---|
141 | and cannot be recreated. Therefore, the data
|
---|
142 | segment is disassembled as a byte stream, with long
|
---|
143 | stretches of null data represented by an appropriate
|
---|
144 | ".zerow" pseudo-op. This limitation notwithstanding,
|
---|
145 | labels (as opposed to symbolic references) are always
|
---|
146 | output at appropriate points within the data segment.
|
---|
147 | .PP
|
---|
148 | If disassembly of the data segment is difficult, disassembly of the
|
---|
149 | bss segment is quite easy, because uninitialized data is all
|
---|
150 | zero by definition. No data
|
---|
151 | is output in the bss segment, but symbolic labels are
|
---|
152 | output as appropriate.
|
---|
153 | .PP
|
---|
154 | For each opcode which takes an operand, a particular
|
---|
155 | symbol type (text, data, or bss) is appropriate. This
|
---|
156 | tidy correspondence is complicated somewhat, however,
|
---|
157 | by the existence of assembler symbolic constants and
|
---|
158 | segment override opcodes. Therefore, the disassembler's
|
---|
159 | symbol lookup routine attempts to apply a certain amount
|
---|
160 | of intelligence when it is asked to find a symbol. If
|
---|
161 | it cannot match on a symbol of the preferred type, it
|
---|
162 | may return a symbol of some other type, depending on
|
---|
163 | preassigned (and somewhat arbitrary) rankings within
|
---|
164 | each type. Finally, if all else fails, it returns a
|
---|
165 | string containing the address sought as a hex constant;
|
---|
166 | this behavior allows calling routines to use the output
|
---|
167 | of the lookup function regardless of the success of its
|
---|
168 | search.
|
---|
169 | .PP
|
---|
170 | It is worth noting, at this point, that the symbol lookup
|
---|
171 | routine operates linearly, and has not been optimized in
|
---|
172 | any way. Execution time is thus likely to increase
|
---|
173 | geometrically with input file size. The disassembler is
|
---|
174 | internally limited to 1500 symbol table entries and 1500
|
---|
175 | relocation table entries; while these limits are generous
|
---|
176 | (/unix, itself, has fewer than 800 symbols), they are not
|
---|
177 | guaranteed to be adequate in all cases. If the symbol
|
---|
178 | table or the relocation table overflows, the disassembly
|
---|
179 | aborts.
|
---|
180 | .PP
|
---|
181 | Finally, users should be aware of a bug in the assembler,
|
---|
182 | which causes it not to parse the "esc" mnemonic, even
|
---|
183 | though "esc" is a completely legitimate opcode which is
|
---|
184 | documented in all the Intel literature. To accommodate
|
---|
185 | this deficiency, the disassembler translates opcodes of
|
---|
186 | the "esc" family to .byte directives, but notes the
|
---|
187 | correct mnemonic in a comment for reference.
|
---|
188 | .PP
|
---|
189 | In all cases, it should be possible to submit the output
|
---|
190 | of the disassembler program to the assembler, and assemble
|
---|
191 | it without error. In most cases, the resulting object
|
---|
192 | code will be identical to the original; in any event, it
|
---|
193 | will be functionally equivalent.
|
---|
194 | .SH "SEE ALSO"
|
---|
195 | adb(1), as(1), cc(1), ld(1).
|
---|
196 | .br
|
---|
197 | "Assembler Reference Manual" in the PC/IX Programmer's
|
---|
198 | Guide.
|
---|
199 | .SH "DIAGNOSTICS"
|
---|
200 | "can't access input file" if the input file cannot be
|
---|
201 | found, opened, or read.
|
---|
202 | .sp
|
---|
203 | "can't open output file" if the output file cannot be
|
---|
204 | created.
|
---|
205 | .sp
|
---|
206 | "warning: host/cpu clash" if the program is run on a
|
---|
207 | machine with a different CPU.
|
---|
208 | .sp
|
---|
209 | "input file not in object format" if the magic number
|
---|
210 | does not correspond to that of a PC/IX object file.
|
---|
211 | .sp
|
---|
212 | "not an 8086/8088 object file" if the CPU ID of the
|
---|
213 | file header is incorrect.
|
---|
214 | .sp
|
---|
215 | "reloc table overflow" if there are more than 1500
|
---|
216 | entries in the relocation table.
|
---|
217 | .sp
|
---|
218 | "symbol table overflow" if there are more than 1500
|
---|
219 | entries in the symbol table.
|
---|
220 | .sp
|
---|
221 | "lseek error" if the input file is corrupted (should
|
---|
222 | never happen).
|
---|
223 | .sp
|
---|
224 | "warning: no symbols" if the symbol table is missing.
|
---|
225 | .sp
|
---|
226 | "can't reopen input file" if the input file is removed
|
---|
227 | or altered during program execution (should never happen).
|
---|
228 | .SH "BUGS"
|
---|
229 | Numeric co-processor (i.e., 8087) mnemonics are not currently supported.
|
---|
230 | Instructions for the co-processor are
|
---|
231 | disassembled as CPU escape sequences, or as interrupts,
|
---|
232 | depending on how they were assembled in the first place.
|
---|
233 | .sp
|
---|
234 | Despite the program's best efforts, a symbol retrieved
|
---|
235 | from the symbol table may sometimes be different from
|
---|
236 | the symbol used in the original assembly.
|
---|
237 | .sp
|
---|
238 | The disassembler's internal tables are of fixed size,
|
---|
239 | and the program aborts if they overflow.
|
---|