| [9] | 1 | #       @(#)TOUR        5.1 (Berkeley) 3/7/91 | 
|---|
|  | 2 |  | 
|---|
|  | 3 | A Tour through Ash | 
|---|
|  | 4 |  | 
|---|
|  | 5 | Copyright 1989 by Kenneth Almquist. | 
|---|
|  | 6 |  | 
|---|
|  | 7 |  | 
|---|
|  | 8 | DIRECTORIES:  The subdirectory bltin contains commands which can | 
|---|
|  | 9 | be compiled stand-alone.  The rest of the source is in the main | 
|---|
|  | 10 | ash directory. | 
|---|
|  | 11 |  | 
|---|
|  | 12 | SOURCE CODE GENERATORS:  Files whose names begin with "mk" are | 
|---|
|  | 13 | programs that generate source code.  A complete list of these | 
|---|
|  | 14 | programs is: | 
|---|
|  | 15 |  | 
|---|
|  | 16 | program         intput files        generates | 
|---|
|  | 17 | -------         ------------        --------- | 
|---|
|  | 18 | mkbuiltins      builtins            builtins.h builtins.c | 
|---|
|  | 19 | mkinit          *.c                 init.c | 
|---|
|  | 20 | mknodes         nodetypes           nodes.h nodes.c | 
|---|
|  | 21 | mksignames          -               signames.h signames.c | 
|---|
|  | 22 | mksyntax            -               syntax.h syntax.c | 
|---|
|  | 23 | mktokens            -               token.def | 
|---|
|  | 24 | bltin/mkexpr    unary_op binary_op  operators.h operators.c | 
|---|
|  | 25 |  | 
|---|
|  | 26 | There are undoubtedly too many of these.  Mkinit searches all the | 
|---|
|  | 27 | C source files for entries looking like: | 
|---|
|  | 28 |  | 
|---|
|  | 29 | INIT { | 
|---|
|  | 30 | x = 1;    /* executed during initialization */ | 
|---|
|  | 31 | } | 
|---|
|  | 32 |  | 
|---|
|  | 33 | RESET { | 
|---|
|  | 34 | x = 2;    /* executed when the shell does a longjmp | 
|---|
|  | 35 | back to the main command loop */ | 
|---|
|  | 36 | } | 
|---|
|  | 37 |  | 
|---|
|  | 38 | SHELLPROC { | 
|---|
|  | 39 | x = 3;    /* executed when the shell runs a shell procedure */ | 
|---|
|  | 40 | } | 
|---|
|  | 41 |  | 
|---|
|  | 42 | It pulls this code out into routines which are when particular | 
|---|
|  | 43 | events occur.  The intent is to improve modularity by isolating | 
|---|
|  | 44 | the information about which modules need to be explicitly | 
|---|
|  | 45 | initialized/reset within the modules themselves. | 
|---|
|  | 46 |  | 
|---|
|  | 47 | Mkinit recognizes several constructs for placing declarations in | 
|---|
|  | 48 | the init.c file. | 
|---|
|  | 49 | INCLUDE "file.h" | 
|---|
|  | 50 | includes a file.  The storage class MKINIT makes a declaration | 
|---|
|  | 51 | available in the init.c file, for example: | 
|---|
|  | 52 | MKINIT int funcnest;    /* depth of function calls */ | 
|---|
|  | 53 | MKINIT alone on a line introduces a structure or union declara- | 
|---|
|  | 54 | tion: | 
|---|
|  | 55 | MKINIT | 
|---|
|  | 56 | struct redirtab { | 
|---|
|  | 57 | short renamed[10]; | 
|---|
|  | 58 | }; | 
|---|
|  | 59 | Preprocessor #define statements are copied to init.c without any | 
|---|
|  | 60 | special action to request this. | 
|---|
|  | 61 |  | 
|---|
|  | 62 | INDENTATION:  The ash source is indented in multiples of six | 
|---|
|  | 63 | spaces.  The only study that I have heard of on the subject con- | 
|---|
|  | 64 | cluded that the optimal amount to indent is in the range of four | 
|---|
|  | 65 | to six spaces.  I use six spaces since it is not too big a jump | 
|---|
|  | 66 | from the widely used eight spaces.  If you really hate six space | 
|---|
|  | 67 | indentation, use the adjind (source included) program to change | 
|---|
|  | 68 | it to something else. | 
|---|
|  | 69 |  | 
|---|
|  | 70 | EXCEPTIONS:  Code for dealing with exceptions appears in | 
|---|
|  | 71 | exceptions.c.  The C language doesn't include exception handling, | 
|---|
|  | 72 | so I implement it using setjmp and longjmp.  The global variable | 
|---|
|  | 73 | exception contains the type of exception.  EXERROR is raised by | 
|---|
|  | 74 | calling error.  EXINT is an interrupt.  EXSHELLPROC is an excep- | 
|---|
|  | 75 | tion which is raised when a shell procedure is invoked.  The pur- | 
|---|
|  | 76 | pose of EXSHELLPROC is to perform the cleanup actions associated | 
|---|
|  | 77 | with other exceptions.  After these cleanup actions, the shell | 
|---|
|  | 78 | can interpret a shell procedure itself without exec'ing a new | 
|---|
|  | 79 | copy of the shell. | 
|---|
|  | 80 |  | 
|---|
|  | 81 | INTERRUPTS:  In an interactive shell, an interrupt will cause an | 
|---|
|  | 82 | EXINT exception to return to the main command loop.  (Exception: | 
|---|
|  | 83 | EXINT is not raised if the user traps interrupts using the trap | 
|---|
|  | 84 | command.)  The INTOFF and INTON macros (defined in exception.h) | 
|---|
|  | 85 | provide uninterruptable critical sections.  Between the execution | 
|---|
|  | 86 | of INTOFF and the execution of INTON, interrupt signals will be | 
|---|
|  | 87 | held for later delivery.  INTOFF and INTON can be nested. | 
|---|
|  | 88 |  | 
|---|
|  | 89 | MEMALLOC.C:  Memalloc.c defines versions of malloc and realloc | 
|---|
|  | 90 | which call error when there is no memory left.  It also defines a | 
|---|
|  | 91 | stack oriented memory allocation scheme.  Allocating off a stack | 
|---|
|  | 92 | is probably more efficient than allocation using malloc, but the | 
|---|
|  | 93 | big advantage is that when an exception occurs all we have to do | 
|---|
|  | 94 | to free up the memory in use at the time of the exception is to | 
|---|
|  | 95 | restore the stack pointer.  The stack is implemented using a | 
|---|
|  | 96 | linked list of blocks. | 
|---|
|  | 97 |  | 
|---|
|  | 98 | STPUTC:  If the stack were contiguous, it would be easy to store | 
|---|
|  | 99 | strings on the stack without knowing in advance how long the | 
|---|
|  | 100 | string was going to be: | 
|---|
|  | 101 | p = stackptr; | 
|---|
|  | 102 | *p++ = c;       /* repeated as many times as needed */ | 
|---|
|  | 103 | stackptr = p; | 
|---|
|  | 104 | The folloing three macros (defined in memalloc.h) perform these | 
|---|
|  | 105 | operations, but grow the stack if you run off the end: | 
|---|
|  | 106 | STARTSTACKSTR(p); | 
|---|
|  | 107 | STPUTC(c, p);   /* repeated as many times as needed */ | 
|---|
|  | 108 | grabstackstr(p); | 
|---|
|  | 109 |  | 
|---|
|  | 110 | We now start a top-down look at the code: | 
|---|
|  | 111 |  | 
|---|
|  | 112 | MAIN.C:  The main routine performs some initialization, executes | 
|---|
|  | 113 | the user's profile if necessary, and calls cmdloop.  Cmdloop is | 
|---|
|  | 114 | repeatedly parses and executes commands. | 
|---|
|  | 115 |  | 
|---|
|  | 116 | OPTIONS.C:  This file contains the option processing code.  It is | 
|---|
|  | 117 | called from main to parse the shell arguments when the shell is | 
|---|
|  | 118 | invoked, and it also contains the set builtin.  The -i and -j op- | 
|---|
|  | 119 | tions (the latter turns on job control) require changes in signal | 
|---|
|  | 120 | handling.  The routines setjobctl (in jobs.c) and setinteractive | 
|---|
|  | 121 | (in trap.c) are called to handle changes to these options. | 
|---|
|  | 122 |  | 
|---|
|  | 123 | PARSING:  The parser code is all in parser.c.  A recursive des- | 
|---|
|  | 124 | cent parser is used.  Syntax tables (generated by mksyntax) are | 
|---|
|  | 125 | used to classify characters during lexical analysis.  There are | 
|---|
|  | 126 | three tables:  one for normal use, one for use when inside single | 
|---|
|  | 127 | quotes, and one for use when inside double quotes.  The tables | 
|---|
|  | 128 | are machine dependent because they are indexed by character vari- | 
|---|
|  | 129 | ables and the range of a char varies from machine to machine. | 
|---|
|  | 130 |  | 
|---|
|  | 131 | PARSE OUTPUT:  The output of the parser consists of a tree of | 
|---|
|  | 132 | nodes.  The various types of nodes are defined in the file node- | 
|---|
|  | 133 | types. | 
|---|
|  | 134 |  | 
|---|
|  | 135 | Nodes of type NARG are used to represent both words and the con- | 
|---|
|  | 136 | tents of here documents.  An early version of ash kept the con- | 
|---|
|  | 137 | tents of here documents in temporary files, but keeping here do- | 
|---|
|  | 138 | cuments in memory typically results in significantly better per- | 
|---|
|  | 139 | formance.  It would have been nice to make it an option to use | 
|---|
|  | 140 | temporary files for here documents, for the benefit of small | 
|---|
|  | 141 | machines, but the code to keep track of when to delete the tem- | 
|---|
|  | 142 | porary files was complex and I never fixed all the bugs in it. | 
|---|
|  | 143 | (AT&T has been maintaining the Bourne shell for more than ten | 
|---|
|  | 144 | years, and to the best of my knowledge they still haven't gotten | 
|---|
|  | 145 | it to handle temporary files correctly in obscure cases.) | 
|---|
|  | 146 |  | 
|---|
|  | 147 | The text field of a NARG structure points to the text of the | 
|---|
|  | 148 | word.  The text consists of ordinary characters and a number of | 
|---|
|  | 149 | special codes defined in parser.h.  The special codes are: | 
|---|
|  | 150 |  | 
|---|
|  | 151 | CTLVAR              Variable substitution | 
|---|
|  | 152 | CTLENDVAR           End of variable substitution | 
|---|
|  | 153 | CTLBACKQ            Command substitution | 
|---|
|  | 154 | CTLBACKQ|CTLQUOTE   Command substitution inside double quotes | 
|---|
|  | 155 | CTLESC              Escape next character | 
|---|
|  | 156 |  | 
|---|
|  | 157 | A variable substitution contains the following elements: | 
|---|
|  | 158 |  | 
|---|
|  | 159 | CTLVAR type name '=' [ alternative-text CTLENDVAR ] | 
|---|
|  | 160 |  | 
|---|
|  | 161 | The type field is a single character specifying the type of sub- | 
|---|
|  | 162 | stitution.  The possible types are: | 
|---|
|  | 163 |  | 
|---|
|  | 164 | VSNORMAL            $var | 
|---|
|  | 165 | VSMINUS             ${var-text} | 
|---|
|  | 166 | VSMINUS|VSNUL       ${var:-text} | 
|---|
|  | 167 | VSPLUS              ${var+text} | 
|---|
|  | 168 | VSPLUS|VSNUL        ${var:+text} | 
|---|
|  | 169 | VSQUESTION          ${var?text} | 
|---|
|  | 170 | VSQUESTION|VSNUL    ${var:?text} | 
|---|
|  | 171 | VSASSIGN            ${var=text} | 
|---|
|  | 172 | VSASSIGN|VSNUL      ${var=text} | 
|---|
|  | 173 |  | 
|---|
|  | 174 | In addition, the type field will have the VSQUOTE flag set if the | 
|---|
|  | 175 | variable is enclosed in double quotes.  The name of the variable | 
|---|
|  | 176 | comes next, terminated by an equals sign.  If the type is not | 
|---|
|  | 177 | VSNORMAL, then the text field in the substitution follows, ter- | 
|---|
|  | 178 | minated by a CTLENDVAR byte. | 
|---|
|  | 179 |  | 
|---|
|  | 180 | Commands in back quotes are parsed and stored in a linked list. | 
|---|
|  | 181 | The locations of these commands in the string are indicated by | 
|---|
|  | 182 | CTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether | 
|---|
|  | 183 | the back quotes were enclosed in double quotes. | 
|---|
|  | 184 |  | 
|---|
|  | 185 | The character CTLESC escapes the next character, so that in case | 
|---|
|  | 186 | any of the CTL characters mentioned above appear in the input, | 
|---|
|  | 187 | they can be passed through transparently.  CTLESC is also used to | 
|---|
|  | 188 | escape '*', '?', '[', and '!' characters which were quoted by the | 
|---|
|  | 189 | user and thus should not be used for file name generation. | 
|---|
|  | 190 |  | 
|---|
|  | 191 | CTLESC characters have proved to be particularly tricky to get | 
|---|
|  | 192 | right.  In the case of here documents which are not subject to | 
|---|
|  | 193 | variable and command substitution, the parser doesn't insert any | 
|---|
|  | 194 | CTLESC characters to begin with (so the contents of the text | 
|---|
|  | 195 | field can be written without any processing).  Other here docu- | 
|---|
|  | 196 | ments, and words which are not subject to splitting and file name | 
|---|
|  | 197 | generation, have the CTLESC characters removed during the vari- | 
|---|
|  | 198 | able and command substitution phase.  Words which are subject | 
|---|
|  | 199 | splitting and file name generation have the CTLESC characters re- | 
|---|
|  | 200 | moved as part of the file name phase. | 
|---|
|  | 201 |  | 
|---|
|  | 202 | EXECUTION:  Command execution is handled by the following files: | 
|---|
|  | 203 | eval.c     The top level routines. | 
|---|
|  | 204 | redir.c    Code to handle redirection of input and output. | 
|---|
|  | 205 | jobs.c     Code to handle forking, waiting, and job control. | 
|---|
|  | 206 | exec.c     Code to to path searches and the actual exec sys call. | 
|---|
|  | 207 | expand.c   Code to evaluate arguments. | 
|---|
|  | 208 | var.c      Maintains the variable symbol table.  Called from expand.c. | 
|---|
|  | 209 |  | 
|---|
|  | 210 | EVAL.C:  Evaltree recursively executes a parse tree.  The exit | 
|---|
|  | 211 | status is returned in the global variable exitstatus.  The alter- | 
|---|
|  | 212 | native entry evalbackcmd is called to evaluate commands in back | 
|---|
|  | 213 | quotes.  It saves the result in memory if the command is a buil- | 
|---|
|  | 214 | tin; otherwise it forks off a child to execute the command and | 
|---|
|  | 215 | connects the standard output of the child to a pipe. | 
|---|
|  | 216 |  | 
|---|
|  | 217 | JOBS.C:  To create a process, you call makejob to return a job | 
|---|
|  | 218 | structure, and then call forkshell (passing the job structure as | 
|---|
|  | 219 | an argument) to create the process.  Waitforjob waits for a job | 
|---|
|  | 220 | to complete.  These routines take care of process groups if job | 
|---|
|  | 221 | control is defined. | 
|---|
|  | 222 |  | 
|---|
|  | 223 | REDIR.C:  Ash allows file descriptors to be redirected and then | 
|---|
|  | 224 | restored without forking off a child process.  This is accom- | 
|---|
|  | 225 | plished by duplicating the original file descriptors.  The redir- | 
|---|
|  | 226 | tab structure records where the file descriptors have be dupli- | 
|---|
|  | 227 | cated to. | 
|---|
|  | 228 |  | 
|---|
|  | 229 | EXEC.C:  The routine find_command locates a command, and enters | 
|---|
|  | 230 | the command in the hash table if it is not already there.  The | 
|---|
|  | 231 | third argument specifies whether it is to print an error message | 
|---|
|  | 232 | if the command is not found.  (When a pipeline is set up, | 
|---|
|  | 233 | find_command is called for all the commands in the pipeline be- | 
|---|
|  | 234 | fore any forking is done, so to get the commands into the hash | 
|---|
|  | 235 | table of the parent process.  But to make command hashing as | 
|---|
|  | 236 | transparent as possible, we silently ignore errors at that point | 
|---|
|  | 237 | and only print error messages if the command cannot be found | 
|---|
|  | 238 | later.) | 
|---|
|  | 239 |  | 
|---|
|  | 240 | The routine shellexec is the interface to the exec system call. | 
|---|
|  | 241 |  | 
|---|
|  | 242 | EXPAND.C:  Arguments are processed in three passes.  The first | 
|---|
|  | 243 | (performed by the routine argstr) performs variable and command | 
|---|
|  | 244 | substitution.  The second (ifsbreakup) performs word splitting | 
|---|
|  | 245 | and the third (expandmeta) performs file name generation.  If the | 
|---|
|  | 246 | "/u" directory is simulated, then when "/u/username" is replaced | 
|---|
|  | 247 | by the user's home directory, the flag "didudir" is set.  This | 
|---|
|  | 248 | tells the cd command that it should print out the directory name, | 
|---|
|  | 249 | just as it would if the "/u" directory were implemented using | 
|---|
|  | 250 | symbolic links. | 
|---|
|  | 251 |  | 
|---|
|  | 252 | VAR.C:  Variables are stored in a hash table.  Probably we should | 
|---|
|  | 253 | switch to extensible hashing.  The variable name is stored in the | 
|---|
|  | 254 | same string as the value (using the format "name=value") so that | 
|---|
|  | 255 | no string copying is needed to create the environment of a com- | 
|---|
|  | 256 | mand.  Variables which the shell references internally are preal- | 
|---|
|  | 257 | located so that the shell can reference the values of these vari- | 
|---|
|  | 258 | ables without doing a lookup. | 
|---|
|  | 259 |  | 
|---|
|  | 260 | When a program is run, the code in eval.c sticks any environment | 
|---|
|  | 261 | variables which precede the command (as in "PATH=xxx command") in | 
|---|
|  | 262 | the variable table as the simplest way to strip duplicates, and | 
|---|
|  | 263 | then calls "environment" to get the value of the environment. | 
|---|
|  | 264 | There are two consequences of this.  First, if an assignment to | 
|---|
|  | 265 | PATH precedes the command, the value of PATH before the assign- | 
|---|
|  | 266 | ment must be remembered and passed to shellexec.  Second, if the | 
|---|
|  | 267 | program turns out to be a shell procedure, the strings from the | 
|---|
|  | 268 | environment variables which preceded the command must be pulled | 
|---|
|  | 269 | out of the table and replaced with strings obtained from malloc, | 
|---|
|  | 270 | since the former will automatically be freed when the stack (see | 
|---|
|  | 271 | the entry on memalloc.c) is emptied. | 
|---|
|  | 272 |  | 
|---|
|  | 273 | BUILTIN COMMANDS:  The procedures for handling these are scat- | 
|---|
|  | 274 | tered throughout the code, depending on which location appears | 
|---|
|  | 275 | most appropriate.  They can be recognized because their names al- | 
|---|
|  | 276 | ways end in "cmd".  The mapping from names to procedures is | 
|---|
|  | 277 | specified in the file builtins, which is processed by the mkbuil- | 
|---|
|  | 278 | tins command. | 
|---|
|  | 279 |  | 
|---|
|  | 280 | A builtin command is invoked with argc and argv set up like a | 
|---|
|  | 281 | normal program.  A builtin command is allowed to overwrite its | 
|---|
|  | 282 | arguments.  Builtin routines can call nextopt to do option pars- | 
|---|
|  | 283 | ing.  This is kind of like getopt, but you don't pass argc and | 
|---|
|  | 284 | argv to it.  Builtin routines can also call error.  This routine | 
|---|
|  | 285 | normally terminates the shell (or returns to the main command | 
|---|
|  | 286 | loop if the shell is interactive), but when called from a builtin | 
|---|
|  | 287 | command it causes the builtin command to terminate with an exit | 
|---|
|  | 288 | status of 2. | 
|---|
|  | 289 |  | 
|---|
|  | 290 | The directory bltins contains commands which can be compiled in- | 
|---|
|  | 291 | dependently but can also be built into the shell for efficiency | 
|---|
|  | 292 | reasons.  The makefile in this directory compiles these programs | 
|---|
|  | 293 | in the normal fashion (so that they can be run regardless of | 
|---|
|  | 294 | whether the invoker is ash), but also creates a library named | 
|---|
|  | 295 | bltinlib.a which can be linked with ash.  The header file bltin.h | 
|---|
|  | 296 | takes care of most of the differences between the ash and the | 
|---|
|  | 297 | stand-alone environment.  The user should call the main routine | 
|---|
|  | 298 | "main", and #define main to be the name of the routine to use | 
|---|
|  | 299 | when the program is linked into ash.  This #define should appear | 
|---|
|  | 300 | before bltin.h is included; bltin.h will #undef main if the pro- | 
|---|
|  | 301 | gram is to be compiled stand-alone. | 
|---|
|  | 302 |  | 
|---|
|  | 303 | CD.C:  This file defines the cd and pwd builtins.  The pwd com- | 
|---|
|  | 304 | mand runs /bin/pwd the first time it is invoked (unless the user | 
|---|
|  | 305 | has already done a cd to an absolute pathname), but then | 
|---|
|  | 306 | remembers the current directory and updates it when the cd com- | 
|---|
|  | 307 | mand is run, so subsequent pwd commands run very fast.  The main | 
|---|
|  | 308 | complication in the cd command is in the docd command, which | 
|---|
|  | 309 | resolves symbolic links into actual names and informs the user | 
|---|
|  | 310 | where the user ended up if he crossed a symbolic link. | 
|---|
|  | 311 |  | 
|---|
|  | 312 | SIGNALS:  Trap.c implements the trap command.  The routine set- | 
|---|
|  | 313 | signal figures out what action should be taken when a signal is | 
|---|
|  | 314 | received and invokes the signal system call to set the signal ac- | 
|---|
|  | 315 | tion appropriately.  When a signal that a user has set a trap for | 
|---|
|  | 316 | is caught, the routine "onsig" sets a flag.  The routine dotrap | 
|---|
|  | 317 | is called at appropriate points to actually handle the signal. | 
|---|
|  | 318 | When an interrupt is caught and no trap has been set for that | 
|---|
|  | 319 | signal, the routine "onint" in error.c is called. | 
|---|
|  | 320 |  | 
|---|
|  | 321 | OUTPUT:  Ash uses it's own output routines.  There are three out- | 
|---|
|  | 322 | put structures allocated.  "Output" represents the standard out- | 
|---|
|  | 323 | put, "errout" the standard error, and "memout" contains output | 
|---|
|  | 324 | which is to be stored in memory.  This last is used when a buil- | 
|---|
|  | 325 | tin command appears in backquotes, to allow its output to be col- | 
|---|
|  | 326 | lected without doing any I/O through the UNIX operating system. | 
|---|
|  | 327 | The variables out1 and out2 normally point to output and errout, | 
|---|
|  | 328 | respectively, but they are set to point to memout when appropri- | 
|---|
|  | 329 | ate inside backquotes. | 
|---|
|  | 330 |  | 
|---|
|  | 331 | INPUT:  The basic input routine is pgetc, which reads from the | 
|---|
|  | 332 | current input file.  There is a stack of input files; the current | 
|---|
|  | 333 | input file is the top file on this stack.  The code allows the | 
|---|
|  | 334 | input to come from a string rather than a file.  (This is for the | 
|---|
|  | 335 | -c option and the "." and eval builtin commands.)  The global | 
|---|
|  | 336 | variable plinno is saved and restored when files are pushed and | 
|---|
|  | 337 | popped from the stack.  The parser routines store the number of | 
|---|
|  | 338 | the current line in this variable. | 
|---|
|  | 339 |  | 
|---|
|  | 340 | DEBUGGING:  If DEBUG is defined in shell.h, then the shell will | 
|---|
|  | 341 | write debugging information to the file $HOME/trace.  Most of | 
|---|
|  | 342 | this is done using the TRACE macro, which takes a set of printf | 
|---|
|  | 343 | arguments inside two sets of parenthesis.  Example: | 
|---|
|  | 344 | "TRACE(("n=%d0, n))".  The double parenthesis are necessary be- | 
|---|
|  | 345 | cause the preprocessor can't handle functions with a variable | 
|---|
|  | 346 | number of arguments.  Defining DEBUG also causes the shell to | 
|---|
|  | 347 | generate a core dump if it is sent a quit signal.  The tracing | 
|---|
|  | 348 | code is in show.c. | 
|---|