source: trunk/minix/commands/dis88/README@ 11

Last change on this file since 11 was 9, checked in by Mattia Monga, 14 years ago

Minix 3.1.2a

File size: 10.8 KB
Line 
1 dis88
2 Beta Release
3 87/09/01
4 ---
5 G. M. HARDING
6 POB 4142
7 Santa Clara CA 95054-0142
8
9
10 "Dis88" is a symbolic disassembler for the Intel 8088 CPU,
11 designed to run under the PC/IX operating system on an IBM XT
12 or fully-compatible clone. Its output is in the format of, and
13 is completely compatible with, the PC/IX assembler, "as". The
14 program is copyrighted by its author, but may be copied and re-
15 distributed freely provided that complete source code, with all
16 copyright notices, accompanies any distribution. This provision
17 also applies to any modifications you may make. You are urged
18 to comment such changes, giving, as a miminum, your name and
19 complete address.
20
21 This release of the program is a beta release, which means
22 that it has been extensively, but not exhaustively, tested.
23 User comments, recommendations, and bug fixes are welcome. The
24 principal features of the current release are:
25
26 (a) The ability to disassemble any file in PC/IX object
27 format, making full use of symbol and relocation information if
28 it is present, regardless of whether the file is executable or
29 linkable, and regardless of whether it has continuous or split
30 I/D space;
31
32 (b) Automatic generation of synthetic labels when no sym-
33 bol table is available; and
34
35 (c) Optional output of address and object-code informa-
36 tion as assembler comment text.
37
38 Limitations of the current release are:
39
40 (a) Numeric co-processor (i.e., 8087) mnemonics are not
41 supported. Instructions for the co-processor are disassembled
42 as CPU escape sequences, or as interrupts, depending on how
43 they were assembled in the first place. This limitation will be
44 addressed in a future release.
45
46 (b) Symbolic references within the object file's data
47 segment are not supported. Thus, for example, if a data segment
48 location is initialized to point to a text segment address, no
49 reference to a text segment symbol will be detected. This limi-
50 tation is likely to remain in future releases, because object
51 code does not, in most cases, contain sufficient information to
52 allow meaningful interpretation of pure data. (Note, however,
53 that symbolic references to the data segment from within the
54 text segment are always supported.)
55
56 As a final caveat, be aware that the PC/IX assembler does
57 not recognize the "esc" mnemonic, even though it refers to a
58 completely valid CPU operation which is documented in all the
59 Intel literature. Thus, the corresponding opcodes (0xd8 through
60 0xdf) are disassembled as .byte directives. For reference, how-
61 ever, the syntactically-correct "esc" instruction is output as
62 a comment.
63
64 To build the disassembler program, transfer all the source
65 files, together with the Makefile, to a suitable (preferably
66 empty) PC/IX directory. Then, simply type "make".
67
68 To use dis88, place it in a directory which appears in
69 your $PATH list. It may then be invoked by name from whatever
70 directory you happen to be in. As a minimum, the program must
71 be invoked with one command-line argument: the name of the ob-
72 ject file to be disassembled. (Dis88 will complain if the file
73 specified is not an object file.) Optionally, you may specify
74 an output file; stdout is the default. One command-line switch
75 is available: "-o", which makes the program display addresses
76 and object code along with its mnemonic disassembly.
77
78 The "-o" option is useful primarily for verifying the cor-
79 rectness of the program's output. In particular, it may be used
80 to check the accuracy of local relative jump opcodes. These
81 jumps often target local labels, which are lost at assembly
82 time; thus, the disassembly may contain cryptic instructions
83 like "jnz .+39". As a user convenience, all relative jump and
84 call opcodes are output with a comment which identifies the
85 physical target address.
86
87 By convention, the release level of the program as a whole
88 is the SID of the file disrel.c, and this SID string appears in
89 each disassembly. Release 2.1 of the program is the first beta
90 release to be distributed on Usenet.
91
92
93.TH dis88 1 LOCAL
94.SH "NAME"
95dis88 \- 8088 symbolic disassembler
96.SH "SYNOPSIS"
97\fBdis88\fP [ -o ] ifile [ ofile ]
98.SH "DESCRIPTION"
99Dis88 reads ifile, which must be in PC/IX a.out format.
100It interprets the binary opcodes and data locations, and
101writes corresponding assembler source code to stdout, or
102to ofile if specified. The program's output is in the
103format of, and fully compatible with, the PC/IX assembler,
104as(1). If a symbol table is present in ifile, labels and
105references will be symbolic in the output. If the input
106file lacks a symbol table, the fact will be noted, and the
107disassembly will proceed, with the disassembler generating
108synthetic labels as needed. If the input file has split
109I/D space, or if it is executable, the disassembler will
110make all necessary adjustments in address-reference calculations.
111.PP
112If the "-o" option appears, object code will be included
113in comments during disassembly of the text segment. This
114feature is used primarily for debugging the disassembler
115itself, but may provide information of passing interest
116to users.
117.PP
118The program always outputs the current machine address
119before disassembling an opcode. If a symbol table is
120present, this address is output as an assembler comment;
121otherwise, it is incorporated into the synthetic label
122which is generated internally. Since relative jumps,
123especially short ones, may target unlabelled locations,
124the program always outputs the physical target address
125as a comment, to assist the user in following the code.
126.PP
127The text segment of an object file is always padded to
128an even machine address. In addition, if the file has
129split I/D space, the text segment will be padded to a
130paragraph boundary (i.e., an address divisible by 16).
131As a result of this padding, the disassembler may produce
132a few spurious, but harmless, instructions at the
133end of the text segment.
134.PP
135Disassembly of the data segment is a difficult matter.
136The information to which initialized data refers cannot
137be inferred from context, except in the special case
138of an external data or address reference, which will be
139reflected in the relocation table. Internal data and
140address references will already be resolved in the object file,
141and cannot be recreated. Therefore, the data
142segment is disassembled as a byte stream, with long
143stretches of null data represented by an appropriate
144".zerow" pseudo-op. This limitation notwithstanding,
145labels (as opposed to symbolic references) are always
146output at appropriate points within the data segment.
147.PP
148If disassembly of the data segment is difficult, disassembly of the
149bss segment is quite easy, because uninitialized data is all
150zero by definition. No data
151is output in the bss segment, but symbolic labels are
152output as appropriate.
153.PP
154For each opcode which takes an operand, a particular
155symbol type (text, data, or bss) is appropriate. This
156tidy correspondence is complicated somewhat, however,
157by the existence of assembler symbolic constants and
158segment override opcodes. Therefore, the disassembler's
159symbol lookup routine attempts to apply a certain amount
160of intelligence when it is asked to find a symbol. If
161it cannot match on a symbol of the preferred type, it
162may return a symbol of some other type, depending on
163preassigned (and somewhat arbitrary) rankings within
164each type. Finally, if all else fails, it returns a
165string containing the address sought as a hex constant;
166this behavior allows calling routines to use the output
167of the lookup function regardless of the success of its
168search.
169.PP
170It is worth noting, at this point, that the symbol lookup
171routine operates linearly, and has not been optimized in
172any way. Execution time is thus likely to increase
173geometrically with input file size. The disassembler is
174internally limited to 1500 symbol table entries and 1500
175relocation table entries; while these limits are generous
176(/unix, itself, has fewer than 800 symbols), they are not
177guaranteed to be adequate in all cases. If the symbol
178table or the relocation table overflows, the disassembly
179aborts.
180.PP
181Finally, users should be aware of a bug in the assembler,
182which causes it not to parse the "esc" mnemonic, even
183though "esc" is a completely legitimate opcode which is
184documented in all the Intel literature. To accommodate
185this deficiency, the disassembler translates opcodes of
186the "esc" family to .byte directives, but notes the
187correct mnemonic in a comment for reference.
188.PP
189In all cases, it should be possible to submit the output
190of the disassembler program to the assembler, and assemble
191it without error. In most cases, the resulting object
192code will be identical to the original; in any event, it
193will be functionally equivalent.
194.SH "SEE ALSO"
195adb(1), as(1), cc(1), ld(1).
196.br
197"Assembler Reference Manual" in the PC/IX Programmer's
198Guide.
199.SH "DIAGNOSTICS"
200"can't access input file" if the input file cannot be
201found, opened, or read.
202.sp
203"can't open output file" if the output file cannot be
204created.
205.sp
206"warning: host/cpu clash" if the program is run on a
207machine with a different CPU.
208.sp
209"input file not in object format" if the magic number
210does not correspond to that of a PC/IX object file.
211.sp
212"not an 8086/8088 object file" if the CPU ID of the
213file header is incorrect.
214.sp
215"reloc table overflow" if there are more than 1500
216entries in the relocation table.
217.sp
218"symbol table overflow" if there are more than 1500
219entries in the symbol table.
220.sp
221"lseek error" if the input file is corrupted (should
222never happen).
223.sp
224"warning: no symbols" if the symbol table is missing.
225.sp
226"can't reopen input file" if the input file is removed
227or altered during program execution (should never happen).
228.SH "BUGS"
229Numeric co-processor (i.e., 8087) mnemonics are not currently supported.
230Instructions for the co-processor are
231disassembled as CPU escape sequences, or as interrupts,
232depending on how they were assembled in the first place.
233.sp
234Despite the program's best efforts, a symbol retrieved
235from the symbol table may sometimes be different from
236the symbol used in the original assembly.
237.sp
238The disassembler's internal tables are of fixed size,
239and the program aborts if they overflow.
Note: See TracBrowser for help on using the repository browser.