1 | # @(#)TOUR 5.1 (Berkeley) 3/7/91
|
---|
2 |
|
---|
3 | A Tour through Ash
|
---|
4 |
|
---|
5 | Copyright 1989 by Kenneth Almquist.
|
---|
6 |
|
---|
7 |
|
---|
8 | DIRECTORIES: The subdirectory bltin contains commands which can
|
---|
9 | be compiled stand-alone. The rest of the source is in the main
|
---|
10 | ash directory.
|
---|
11 |
|
---|
12 | SOURCE CODE GENERATORS: Files whose names begin with "mk" are
|
---|
13 | programs that generate source code. A complete list of these
|
---|
14 | programs is:
|
---|
15 |
|
---|
16 | program intput files generates
|
---|
17 | ------- ------------ ---------
|
---|
18 | mkbuiltins builtins builtins.h builtins.c
|
---|
19 | mkinit *.c init.c
|
---|
20 | mknodes nodetypes nodes.h nodes.c
|
---|
21 | mksignames - signames.h signames.c
|
---|
22 | mksyntax - syntax.h syntax.c
|
---|
23 | mktokens - token.def
|
---|
24 | bltin/mkexpr unary_op binary_op operators.h operators.c
|
---|
25 |
|
---|
26 | There are undoubtedly too many of these. Mkinit searches all the
|
---|
27 | C source files for entries looking like:
|
---|
28 |
|
---|
29 | INIT {
|
---|
30 | x = 1; /* executed during initialization */
|
---|
31 | }
|
---|
32 |
|
---|
33 | RESET {
|
---|
34 | x = 2; /* executed when the shell does a longjmp
|
---|
35 | back to the main command loop */
|
---|
36 | }
|
---|
37 |
|
---|
38 | SHELLPROC {
|
---|
39 | x = 3; /* executed when the shell runs a shell procedure */
|
---|
40 | }
|
---|
41 |
|
---|
42 | It pulls this code out into routines which are when particular
|
---|
43 | events occur. The intent is to improve modularity by isolating
|
---|
44 | the information about which modules need to be explicitly
|
---|
45 | initialized/reset within the modules themselves.
|
---|
46 |
|
---|
47 | Mkinit recognizes several constructs for placing declarations in
|
---|
48 | the init.c file.
|
---|
49 | INCLUDE "file.h"
|
---|
50 | includes a file. The storage class MKINIT makes a declaration
|
---|
51 | available in the init.c file, for example:
|
---|
52 | MKINIT int funcnest; /* depth of function calls */
|
---|
53 | MKINIT alone on a line introduces a structure or union declara-
|
---|
54 | tion:
|
---|
55 | MKINIT
|
---|
56 | struct redirtab {
|
---|
57 | short renamed[10];
|
---|
58 | };
|
---|
59 | Preprocessor #define statements are copied to init.c without any
|
---|
60 | special action to request this.
|
---|
61 |
|
---|
62 | INDENTATION: The ash source is indented in multiples of six
|
---|
63 | spaces. The only study that I have heard of on the subject con-
|
---|
64 | cluded that the optimal amount to indent is in the range of four
|
---|
65 | to six spaces. I use six spaces since it is not too big a jump
|
---|
66 | from the widely used eight spaces. If you really hate six space
|
---|
67 | indentation, use the adjind (source included) program to change
|
---|
68 | it to something else.
|
---|
69 |
|
---|
70 | EXCEPTIONS: Code for dealing with exceptions appears in
|
---|
71 | exceptions.c. The C language doesn't include exception handling,
|
---|
72 | so I implement it using setjmp and longjmp. The global variable
|
---|
73 | exception contains the type of exception. EXERROR is raised by
|
---|
74 | calling error. EXINT is an interrupt. EXSHELLPROC is an excep-
|
---|
75 | tion which is raised when a shell procedure is invoked. The pur-
|
---|
76 | pose of EXSHELLPROC is to perform the cleanup actions associated
|
---|
77 | with other exceptions. After these cleanup actions, the shell
|
---|
78 | can interpret a shell procedure itself without exec'ing a new
|
---|
79 | copy of the shell.
|
---|
80 |
|
---|
81 | INTERRUPTS: In an interactive shell, an interrupt will cause an
|
---|
82 | EXINT exception to return to the main command loop. (Exception:
|
---|
83 | EXINT is not raised if the user traps interrupts using the trap
|
---|
84 | command.) The INTOFF and INTON macros (defined in exception.h)
|
---|
85 | provide uninterruptable critical sections. Between the execution
|
---|
86 | of INTOFF and the execution of INTON, interrupt signals will be
|
---|
87 | held for later delivery. INTOFF and INTON can be nested.
|
---|
88 |
|
---|
89 | MEMALLOC.C: Memalloc.c defines versions of malloc and realloc
|
---|
90 | which call error when there is no memory left. It also defines a
|
---|
91 | stack oriented memory allocation scheme. Allocating off a stack
|
---|
92 | is probably more efficient than allocation using malloc, but the
|
---|
93 | big advantage is that when an exception occurs all we have to do
|
---|
94 | to free up the memory in use at the time of the exception is to
|
---|
95 | restore the stack pointer. The stack is implemented using a
|
---|
96 | linked list of blocks.
|
---|
97 |
|
---|
98 | STPUTC: If the stack were contiguous, it would be easy to store
|
---|
99 | strings on the stack without knowing in advance how long the
|
---|
100 | string was going to be:
|
---|
101 | p = stackptr;
|
---|
102 | *p++ = c; /* repeated as many times as needed */
|
---|
103 | stackptr = p;
|
---|
104 | The folloing three macros (defined in memalloc.h) perform these
|
---|
105 | operations, but grow the stack if you run off the end:
|
---|
106 | STARTSTACKSTR(p);
|
---|
107 | STPUTC(c, p); /* repeated as many times as needed */
|
---|
108 | grabstackstr(p);
|
---|
109 |
|
---|
110 | We now start a top-down look at the code:
|
---|
111 |
|
---|
112 | MAIN.C: The main routine performs some initialization, executes
|
---|
113 | the user's profile if necessary, and calls cmdloop. Cmdloop is
|
---|
114 | repeatedly parses and executes commands.
|
---|
115 |
|
---|
116 | OPTIONS.C: This file contains the option processing code. It is
|
---|
117 | called from main to parse the shell arguments when the shell is
|
---|
118 | invoked, and it also contains the set builtin. The -i and -j op-
|
---|
119 | tions (the latter turns on job control) require changes in signal
|
---|
120 | handling. The routines setjobctl (in jobs.c) and setinteractive
|
---|
121 | (in trap.c) are called to handle changes to these options.
|
---|
122 |
|
---|
123 | PARSING: The parser code is all in parser.c. A recursive des-
|
---|
124 | cent parser is used. Syntax tables (generated by mksyntax) are
|
---|
125 | used to classify characters during lexical analysis. There are
|
---|
126 | three tables: one for normal use, one for use when inside single
|
---|
127 | quotes, and one for use when inside double quotes. The tables
|
---|
128 | are machine dependent because they are indexed by character vari-
|
---|
129 | ables and the range of a char varies from machine to machine.
|
---|
130 |
|
---|
131 | PARSE OUTPUT: The output of the parser consists of a tree of
|
---|
132 | nodes. The various types of nodes are defined in the file node-
|
---|
133 | types.
|
---|
134 |
|
---|
135 | Nodes of type NARG are used to represent both words and the con-
|
---|
136 | tents of here documents. An early version of ash kept the con-
|
---|
137 | tents of here documents in temporary files, but keeping here do-
|
---|
138 | cuments in memory typically results in significantly better per-
|
---|
139 | formance. It would have been nice to make it an option to use
|
---|
140 | temporary files for here documents, for the benefit of small
|
---|
141 | machines, but the code to keep track of when to delete the tem-
|
---|
142 | porary files was complex and I never fixed all the bugs in it.
|
---|
143 | (AT&T has been maintaining the Bourne shell for more than ten
|
---|
144 | years, and to the best of my knowledge they still haven't gotten
|
---|
145 | it to handle temporary files correctly in obscure cases.)
|
---|
146 |
|
---|
147 | The text field of a NARG structure points to the text of the
|
---|
148 | word. The text consists of ordinary characters and a number of
|
---|
149 | special codes defined in parser.h. The special codes are:
|
---|
150 |
|
---|
151 | CTLVAR Variable substitution
|
---|
152 | CTLENDVAR End of variable substitution
|
---|
153 | CTLBACKQ Command substitution
|
---|
154 | CTLBACKQ|CTLQUOTE Command substitution inside double quotes
|
---|
155 | CTLESC Escape next character
|
---|
156 |
|
---|
157 | A variable substitution contains the following elements:
|
---|
158 |
|
---|
159 | CTLVAR type name '=' [ alternative-text CTLENDVAR ]
|
---|
160 |
|
---|
161 | The type field is a single character specifying the type of sub-
|
---|
162 | stitution. The possible types are:
|
---|
163 |
|
---|
164 | VSNORMAL $var
|
---|
165 | VSMINUS ${var-text}
|
---|
166 | VSMINUS|VSNUL ${var:-text}
|
---|
167 | VSPLUS ${var+text}
|
---|
168 | VSPLUS|VSNUL ${var:+text}
|
---|
169 | VSQUESTION ${var?text}
|
---|
170 | VSQUESTION|VSNUL ${var:?text}
|
---|
171 | VSASSIGN ${var=text}
|
---|
172 | VSASSIGN|VSNUL ${var=text}
|
---|
173 |
|
---|
174 | In addition, the type field will have the VSQUOTE flag set if the
|
---|
175 | variable is enclosed in double quotes. The name of the variable
|
---|
176 | comes next, terminated by an equals sign. If the type is not
|
---|
177 | VSNORMAL, then the text field in the substitution follows, ter-
|
---|
178 | minated by a CTLENDVAR byte.
|
---|
179 |
|
---|
180 | Commands in back quotes are parsed and stored in a linked list.
|
---|
181 | The locations of these commands in the string are indicated by
|
---|
182 | CTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether
|
---|
183 | the back quotes were enclosed in double quotes.
|
---|
184 |
|
---|
185 | The character CTLESC escapes the next character, so that in case
|
---|
186 | any of the CTL characters mentioned above appear in the input,
|
---|
187 | they can be passed through transparently. CTLESC is also used to
|
---|
188 | escape '*', '?', '[', and '!' characters which were quoted by the
|
---|
189 | user and thus should not be used for file name generation.
|
---|
190 |
|
---|
191 | CTLESC characters have proved to be particularly tricky to get
|
---|
192 | right. In the case of here documents which are not subject to
|
---|
193 | variable and command substitution, the parser doesn't insert any
|
---|
194 | CTLESC characters to begin with (so the contents of the text
|
---|
195 | field can be written without any processing). Other here docu-
|
---|
196 | ments, and words which are not subject to splitting and file name
|
---|
197 | generation, have the CTLESC characters removed during the vari-
|
---|
198 | able and command substitution phase. Words which are subject
|
---|
199 | splitting and file name generation have the CTLESC characters re-
|
---|
200 | moved as part of the file name phase.
|
---|
201 |
|
---|
202 | EXECUTION: Command execution is handled by the following files:
|
---|
203 | eval.c The top level routines.
|
---|
204 | redir.c Code to handle redirection of input and output.
|
---|
205 | jobs.c Code to handle forking, waiting, and job control.
|
---|
206 | exec.c Code to to path searches and the actual exec sys call.
|
---|
207 | expand.c Code to evaluate arguments.
|
---|
208 | var.c Maintains the variable symbol table. Called from expand.c.
|
---|
209 |
|
---|
210 | EVAL.C: Evaltree recursively executes a parse tree. The exit
|
---|
211 | status is returned in the global variable exitstatus. The alter-
|
---|
212 | native entry evalbackcmd is called to evaluate commands in back
|
---|
213 | quotes. It saves the result in memory if the command is a buil-
|
---|
214 | tin; otherwise it forks off a child to execute the command and
|
---|
215 | connects the standard output of the child to a pipe.
|
---|
216 |
|
---|
217 | JOBS.C: To create a process, you call makejob to return a job
|
---|
218 | structure, and then call forkshell (passing the job structure as
|
---|
219 | an argument) to create the process. Waitforjob waits for a job
|
---|
220 | to complete. These routines take care of process groups if job
|
---|
221 | control is defined.
|
---|
222 |
|
---|
223 | REDIR.C: Ash allows file descriptors to be redirected and then
|
---|
224 | restored without forking off a child process. This is accom-
|
---|
225 | plished by duplicating the original file descriptors. The redir-
|
---|
226 | tab structure records where the file descriptors have be dupli-
|
---|
227 | cated to.
|
---|
228 |
|
---|
229 | EXEC.C: The routine find_command locates a command, and enters
|
---|
230 | the command in the hash table if it is not already there. The
|
---|
231 | third argument specifies whether it is to print an error message
|
---|
232 | if the command is not found. (When a pipeline is set up,
|
---|
233 | find_command is called for all the commands in the pipeline be-
|
---|
234 | fore any forking is done, so to get the commands into the hash
|
---|
235 | table of the parent process. But to make command hashing as
|
---|
236 | transparent as possible, we silently ignore errors at that point
|
---|
237 | and only print error messages if the command cannot be found
|
---|
238 | later.)
|
---|
239 |
|
---|
240 | The routine shellexec is the interface to the exec system call.
|
---|
241 |
|
---|
242 | EXPAND.C: Arguments are processed in three passes. The first
|
---|
243 | (performed by the routine argstr) performs variable and command
|
---|
244 | substitution. The second (ifsbreakup) performs word splitting
|
---|
245 | and the third (expandmeta) performs file name generation. If the
|
---|
246 | "/u" directory is simulated, then when "/u/username" is replaced
|
---|
247 | by the user's home directory, the flag "didudir" is set. This
|
---|
248 | tells the cd command that it should print out the directory name,
|
---|
249 | just as it would if the "/u" directory were implemented using
|
---|
250 | symbolic links.
|
---|
251 |
|
---|
252 | VAR.C: Variables are stored in a hash table. Probably we should
|
---|
253 | switch to extensible hashing. The variable name is stored in the
|
---|
254 | same string as the value (using the format "name=value") so that
|
---|
255 | no string copying is needed to create the environment of a com-
|
---|
256 | mand. Variables which the shell references internally are preal-
|
---|
257 | located so that the shell can reference the values of these vari-
|
---|
258 | ables without doing a lookup.
|
---|
259 |
|
---|
260 | When a program is run, the code in eval.c sticks any environment
|
---|
261 | variables which precede the command (as in "PATH=xxx command") in
|
---|
262 | the variable table as the simplest way to strip duplicates, and
|
---|
263 | then calls "environment" to get the value of the environment.
|
---|
264 | There are two consequences of this. First, if an assignment to
|
---|
265 | PATH precedes the command, the value of PATH before the assign-
|
---|
266 | ment must be remembered and passed to shellexec. Second, if the
|
---|
267 | program turns out to be a shell procedure, the strings from the
|
---|
268 | environment variables which preceded the command must be pulled
|
---|
269 | out of the table and replaced with strings obtained from malloc,
|
---|
270 | since the former will automatically be freed when the stack (see
|
---|
271 | the entry on memalloc.c) is emptied.
|
---|
272 |
|
---|
273 | BUILTIN COMMANDS: The procedures for handling these are scat-
|
---|
274 | tered throughout the code, depending on which location appears
|
---|
275 | most appropriate. They can be recognized because their names al-
|
---|
276 | ways end in "cmd". The mapping from names to procedures is
|
---|
277 | specified in the file builtins, which is processed by the mkbuil-
|
---|
278 | tins command.
|
---|
279 |
|
---|
280 | A builtin command is invoked with argc and argv set up like a
|
---|
281 | normal program. A builtin command is allowed to overwrite its
|
---|
282 | arguments. Builtin routines can call nextopt to do option pars-
|
---|
283 | ing. This is kind of like getopt, but you don't pass argc and
|
---|
284 | argv to it. Builtin routines can also call error. This routine
|
---|
285 | normally terminates the shell (or returns to the main command
|
---|
286 | loop if the shell is interactive), but when called from a builtin
|
---|
287 | command it causes the builtin command to terminate with an exit
|
---|
288 | status of 2.
|
---|
289 |
|
---|
290 | The directory bltins contains commands which can be compiled in-
|
---|
291 | dependently but can also be built into the shell for efficiency
|
---|
292 | reasons. The makefile in this directory compiles these programs
|
---|
293 | in the normal fashion (so that they can be run regardless of
|
---|
294 | whether the invoker is ash), but also creates a library named
|
---|
295 | bltinlib.a which can be linked with ash. The header file bltin.h
|
---|
296 | takes care of most of the differences between the ash and the
|
---|
297 | stand-alone environment. The user should call the main routine
|
---|
298 | "main", and #define main to be the name of the routine to use
|
---|
299 | when the program is linked into ash. This #define should appear
|
---|
300 | before bltin.h is included; bltin.h will #undef main if the pro-
|
---|
301 | gram is to be compiled stand-alone.
|
---|
302 |
|
---|
303 | CD.C: This file defines the cd and pwd builtins. The pwd com-
|
---|
304 | mand runs /bin/pwd the first time it is invoked (unless the user
|
---|
305 | has already done a cd to an absolute pathname), but then
|
---|
306 | remembers the current directory and updates it when the cd com-
|
---|
307 | mand is run, so subsequent pwd commands run very fast. The main
|
---|
308 | complication in the cd command is in the docd command, which
|
---|
309 | resolves symbolic links into actual names and informs the user
|
---|
310 | where the user ended up if he crossed a symbolic link.
|
---|
311 |
|
---|
312 | SIGNALS: Trap.c implements the trap command. The routine set-
|
---|
313 | signal figures out what action should be taken when a signal is
|
---|
314 | received and invokes the signal system call to set the signal ac-
|
---|
315 | tion appropriately. When a signal that a user has set a trap for
|
---|
316 | is caught, the routine "onsig" sets a flag. The routine dotrap
|
---|
317 | is called at appropriate points to actually handle the signal.
|
---|
318 | When an interrupt is caught and no trap has been set for that
|
---|
319 | signal, the routine "onint" in error.c is called.
|
---|
320 |
|
---|
321 | OUTPUT: Ash uses it's own output routines. There are three out-
|
---|
322 | put structures allocated. "Output" represents the standard out-
|
---|
323 | put, "errout" the standard error, and "memout" contains output
|
---|
324 | which is to be stored in memory. This last is used when a buil-
|
---|
325 | tin command appears in backquotes, to allow its output to be col-
|
---|
326 | lected without doing any I/O through the UNIX operating system.
|
---|
327 | The variables out1 and out2 normally point to output and errout,
|
---|
328 | respectively, but they are set to point to memout when appropri-
|
---|
329 | ate inside backquotes.
|
---|
330 |
|
---|
331 | INPUT: The basic input routine is pgetc, which reads from the
|
---|
332 | current input file. There is a stack of input files; the current
|
---|
333 | input file is the top file on this stack. The code allows the
|
---|
334 | input to come from a string rather than a file. (This is for the
|
---|
335 | -c option and the "." and eval builtin commands.) The global
|
---|
336 | variable plinno is saved and restored when files are pushed and
|
---|
337 | popped from the stack. The parser routines store the number of
|
---|
338 | the current line in this variable.
|
---|
339 |
|
---|
340 | DEBUGGING: If DEBUG is defined in shell.h, then the shell will
|
---|
341 | write debugging information to the file $HOME/trace. Most of
|
---|
342 | this is done using the TRACE macro, which takes a set of printf
|
---|
343 | arguments inside two sets of parenthesis. Example:
|
---|
344 | "TRACE(("n=%d0, n))". The double parenthesis are necessary be-
|
---|
345 | cause the preprocessor can't handle functions with a variable
|
---|
346 | number of arguments. Defining DEBUG also causes the shell to
|
---|
347 | generate a core dump if it is sent a quit signal. The tracing
|
---|
348 | code is in show.c.
|
---|