| 1 | .so mnx.mac
 | 
|---|
| 2 | .TH AWK 9
 | 
|---|
| 3 | .CD "awk \(en pattern matching language"
 | 
|---|
| 4 | .SX "awk \fIrules\fR [\fIfile\fR] ...
 | 
|---|
| 5 | .FL "\fR(none)"
 | 
|---|
| 6 | .EX "awk rules input" "Process \fIinput\fR according to \fIrules\fR"
 | 
|---|
| 7 | .EX "awk rules \(en  >out" "Input from terminal, output to \fIout\fR"
 | 
|---|
| 8 | .PP
 | 
|---|
| 9 | AWK is a programming language devised by Aho, Weinberger, and Kernighan
 | 
|---|
| 10 | at Bell Labs (hence the name).
 | 
|---|
| 11 | \fIAwk\fR programs search files for
 | 
|---|
| 12 | specific patterns and performs \*(OQactions\*(CQ for every occurrence
 | 
|---|
| 13 | of these patterns.  The patterns can be \*(OQregular expressions\*(CQ
 | 
|---|
| 14 | as used in the \fIed\fR editor.  The actions are expressed
 | 
|---|
| 15 | using a subset of the C language.
 | 
|---|
| 16 | .PP
 | 
|---|
| 17 | The patterns and actions are usually placed in a \*(OQrules\*(CQ file
 | 
|---|
| 18 | whose name must be the first argument in the command line,
 | 
|---|
| 19 | preceded by the flag \fB\(enf\fR.  Otherwise, the first argument on the
 | 
|---|
| 20 | command line is taken to be a string containing the rules
 | 
|---|
| 21 | themselves. All other arguments are taken to be the names of text
 | 
|---|
| 22 | files on which the rules are to be applied, with \fB\(en\fR being the
 | 
|---|
| 23 | standard input.  To take rules from the standard input, use \fB\(enf \(en\fR.
 | 
|---|
| 24 | .PP
 | 
|---|
| 25 | The command:
 | 
|---|
| 26 | .HS
 | 
|---|
| 27 | .Cx "awk  rules  prog.\d\s+2*\s0\u"
 | 
|---|
| 28 | .HS
 | 
|---|
| 29 | would read the patterns and actions rules from the file \fIrules\fR
 | 
|---|
| 30 | and apply them to all the arguments.
 | 
|---|
| 31 | .PP
 | 
|---|
| 32 | The general format of a rules file is:
 | 
|---|
| 33 | .HS
 | 
|---|
| 34 | ~~~<pattern> { <action> }
 | 
|---|
| 35 | ~~~<pattern> { <action> }
 | 
|---|
| 36 | ~~~...
 | 
|---|
| 37 | .HS
 | 
|---|
| 38 | There may be any number of these <pattern> { <action> }
 | 
|---|
| 39 | sequences in the rules file.  \fIAwk\fR reads a line of input from
 | 
|---|
| 40 | the current input file and applies every <pattern> { <action> }
 | 
|---|
| 41 | in sequence to the line.
 | 
|---|
| 42 | .PP
 | 
|---|
| 43 | If the <pattern> corresponding to any { <action> } is missing,
 | 
|---|
| 44 | the action is applied to every line of input.  The default
 | 
|---|
| 45 | { <action> } is to print the matched input line.
 | 
|---|
| 46 | .SS "Patterns"
 | 
|---|
| 47 | .PP
 | 
|---|
| 48 | The <pattern>s may consist of any valid C expression.  If the
 | 
|---|
| 49 | <pattern> consists of two expressions separated by a comma, it
 | 
|---|
| 50 | is taken to be a range and the <action> is performed on all
 | 
|---|
| 51 | lines of input that match the range.  <pattern>s may contain
 | 
|---|
| 52 | \*(OQregular expressions\*(CQ delimited by an @ symbol.  Regular
 | 
|---|
| 53 | expressions can be thought of as a generalized \*(OQwildcard\*(CQ
 | 
|---|
| 54 | string matching mechanism, similar to that used by many
 | 
|---|
| 55 | operating systems to specify file names.  Regular expressions
 | 
|---|
| 56 | may contain any of the following characters:
 | 
|---|
| 57 | .HS
 | 
|---|
| 58 | .in +0.75i
 | 
|---|
| 59 | .ta +0.5i
 | 
|---|
| 60 | .ti -0.5i
 | 
|---|
| 61 | x       An ordinary character
 | 
|---|
| 62 | .ti -0.5i
 | 
|---|
| 63 | \\      The backslash quotes any character
 | 
|---|
| 64 | .ti -0.5i
 | 
|---|
| 65 | ^       A circumflex at the beginning of an expr matches the beginning of a line.
 | 
|---|
| 66 | .ti -0.5i
 | 
|---|
| 67 | $       A dollar-sign at the end of an expression matches the end of a line.
 | 
|---|
| 68 | .ti -0.5i
 | 
|---|
| 69 | \&.     A period matches any single character except newline.
 | 
|---|
| 70 | .ti -0.5i
 | 
|---|
| 71 | *       An expression followed by an asterisk matches zero or more occurrences
 | 
|---|
| 72 | of that expression: \*(OQfo*\*(CQ matches \*(OQf\*(CQ, \*(OQfo\*(CQ, \*(OQfoo\*(CQ, \*(OQfooo\*(CQ, etc.
 | 
|---|
| 73 | .ti -0.5i
 | 
|---|
| 74 | +       An expression followed by a plus sign matches one or more occurrences 
 | 
|---|
| 75 | of that expression: \*(OQfo+\*(CQ matches \*(OQfo\*(CQ, \*(OQfoo\*(CQ, \*(OQfooo\*(CQ, etc.
 | 
|---|
| 76 | .ti -0.5i
 | 
|---|
| 77 | []      A string enclosed in square brackets matches any single character in that 
 | 
|---|
| 78 | string, but no others.  If the first character in the string is a circumflex, the 
 | 
|---|
| 79 | expression matches any character except newline and the characters in the 
 | 
|---|
| 80 | string.  For example, \*(OQ[xyz]\*(CQ matches \*(OQxx\*(CQ and \*(OQzyx\*(CQ, while 
 | 
|---|
| 81 | \*(OQ[^xyz]\*(CQ matches \*(OQabc\*(CQ but not \*(OQaxb\*(CQ.  A range of characters may be 
 | 
|---|
| 82 | specified by two characters separated by \*(OQ-\*(CQ.
 | 
|---|
| 83 | .in -0.75i
 | 
|---|
| 84 | .SS "Actions"
 | 
|---|
| 85 | .PP
 | 
|---|
| 86 | Actions are expressed as a subset of the C language.  All
 | 
|---|
| 87 | variables are global and default to int's if not formally
 | 
|---|
| 88 | declared.  
 | 
|---|
| 89 | Only char's and int's and pointers and arrays of
 | 
|---|
| 90 | char and int are allowed.  \fIAwk\fR allows only decimal integer
 | 
|---|
| 91 | constants to be used\(emno hex (0xnn) or octal (0nn). String
 | 
|---|
| 92 | and character constants may contain all of the special C
 | 
|---|
| 93 | escapes (\\n, \\r, etc.).
 | 
|---|
| 94 | .PP
 | 
|---|
| 95 | \fIAwk\fR supports the \*(OQif\*(CQ, \*(OQelse\*(CQ, 
 | 
|---|
| 96 | \*(OQwhile\*(CQ and \*(OQbreak\*(CQ flow of
 | 
|---|
| 97 | control constructs, which behave exactly as in C.
 | 
|---|
| 98 | .PP
 | 
|---|
| 99 | Also supported are the following unary and binary operators,
 | 
|---|
| 100 | listed in order from highest to lowest precedence:
 | 
|---|
| 101 | .HS
 | 
|---|
| 102 | .ta 0.25i 1.75i 3.0i
 | 
|---|
| 103 | .nf
 | 
|---|
| 104 | \fB     Operator        Type    Associativity\fR
 | 
|---|
| 105 |         () []   unary   left to right
 | 
|---|
| 106 | .tr ~~
 | 
|---|
| 107 |         ! ~ ++ \(en\(en \(en * &        unary   right to left
 | 
|---|
| 108 | .tr ~
 | 
|---|
| 109 |         * / %   binary  left to right
 | 
|---|
| 110 |         + \(en  binary  left to right
 | 
|---|
| 111 |         << >>   binary  left to right
 | 
|---|
| 112 |         < <= > >=       binary  left to right
 | 
|---|
| 113 |         == !=   binary  left to right
 | 
|---|
| 114 |         &       binary  left to right
 | 
|---|
| 115 |         ^       binary  left to right
 | 
|---|
| 116 |         |       binary  left to right
 | 
|---|
| 117 |         &&      binary  left to right
 | 
|---|
| 118 |         ||      binary  left to right
 | 
|---|
| 119 |         =       binary  right to left
 | 
|---|
| 120 | .fi
 | 
|---|
| 121 | .HS
 | 
|---|
| 122 | Comments are introduced by a '#' symbol and are terminated by
 | 
|---|
| 123 | the first newline character.  The standard \*(OQ/*\*(CQ and \*(OQ*/\*(CQ
 | 
|---|
| 124 | comment delimiters are not supported and will result in a
 | 
|---|
| 125 | syntax error.
 | 
|---|
| 126 | .SP 0.5
 | 
|---|
| 127 | .SS "Fields"
 | 
|---|
| 128 | .SP 0.5
 | 
|---|
| 129 | .PP
 | 
|---|
| 130 | When \fIawk\fR reads a line from the current input file, the
 | 
|---|
| 131 | record is automatically separated into \*(OQfields.\*(CQ  A field is
 | 
|---|
| 132 | simply a string of consecutive characters delimited by either
 | 
|---|
| 133 | the beginning or end of line, or a \*(OQfield separator\*(CQ character.
 | 
|---|
| 134 | Initially, the field separators are the space and tab character.
 | 
|---|
| 135 | The special unary operator '$' is used to reference one of the
 | 
|---|
| 136 | fields in the current input record (line).  The fields are
 | 
|---|
| 137 | numbered sequentially starting at 1.  The expression \*(OQ$0\*(CQ
 | 
|---|
| 138 | references the entire input line.
 | 
|---|
| 139 | .PP
 | 
|---|
| 140 | Similarly, the \*(OQrecord separator\*(CQ is used to determine the end
 | 
|---|
| 141 | of an input \*(OQline,\*(CQ initially the newline character.  The field
 | 
|---|
| 142 | and record separators may be changed programatically by one of
 | 
|---|
| 143 | the actions and will remain in effect until changed again.
 | 
|---|
| 144 | .PP
 | 
|---|
| 145 | Multiple (up to 10) field separators are allowed at a time, but
 | 
|---|
| 146 | only one record separator.
 | 
|---|
| 147 | .PP
 | 
|---|
| 148 | Fields behave exactly like strings; and can be used in the same
 | 
|---|
| 149 | context as a character array.  These \*(OQarrays\*(CQ can be considered
 | 
|---|
| 150 | to have been declared as:
 | 
|---|
| 151 | .SP 0.15
 | 
|---|
| 152 | .HS
 | 
|---|
| 153 | ~~~~~char ($n)[ 128 ];
 | 
|---|
| 154 | .HS
 | 
|---|
| 155 | .SP 0.15
 | 
|---|
| 156 | In other words, they are 128 bytes long.  Notice that the
 | 
|---|
| 157 | parentheses are necessary because the operators [] and $
 | 
|---|
| 158 | associate from right to left; without them, the statement
 | 
|---|
| 159 | would have parsed as:
 | 
|---|
| 160 | .HS
 | 
|---|
| 161 | .SP 0.15
 | 
|---|
| 162 | ~~~~~char $(1[ 128 ]);
 | 
|---|
| 163 | .HS
 | 
|---|
| 164 | .SP 0.15
 | 
|---|
| 165 | which is obviously ridiculous.
 | 
|---|
| 166 | .PP
 | 
|---|
| 167 | If the contents of one of these field arrays is altered, the
 | 
|---|
| 168 | \*(OQ$0\*(CQ field will reflect this change.  For example, this
 | 
|---|
| 169 | expression:
 | 
|---|
| 170 | .HS
 | 
|---|
| 171 | .SP 0.15
 | 
|---|
| 172 | ~~~~~*$4 = 'A';
 | 
|---|
| 173 | .HS
 | 
|---|
| 174 | .SP 0.15
 | 
|---|
| 175 | will change the first character of the fourth field to an upper-
 | 
|---|
| 176 | case letter 'A'.  Then, when the following input line:
 | 
|---|
| 177 | .HS
 | 
|---|
| 178 | .SP 0.15
 | 
|---|
| 179 | ~~~~~120 PRINT "Name         address        Zip"
 | 
|---|
| 180 | .SP 0.15
 | 
|---|
| 181 | .HS
 | 
|---|
| 182 | is processed, it would be printed as:
 | 
|---|
| 183 | .HS
 | 
|---|
| 184 | .SP 0.15
 | 
|---|
| 185 | ~~~~~120 PRINT "Name         Address        Zip"
 | 
|---|
| 186 | .HS
 | 
|---|
| 187 | .SP 0.15
 | 
|---|
| 188 | Fields may also be modified with the strcpy() function (see
 | 
|---|
| 189 | below).  For example, the expression:
 | 
|---|
| 190 | .HS
 | 
|---|
| 191 | ~~~~~strcpy( $4, "Addr." );
 | 
|---|
| 192 | .HS
 | 
|---|
| 193 | applied to the same line above would yield:
 | 
|---|
| 194 | .HS
 | 
|---|
| 195 | ~~~~~120 PRINT "Name         Addr.        Zip"
 | 
|---|
| 196 | .HS
 | 
|---|
| 197 | .SS "Predefined Variables"
 | 
|---|
| 198 | .PP
 | 
|---|
| 199 | The following variables are pre-defined:
 | 
|---|
| 200 | .HS
 | 
|---|
| 201 | .in +1.5i
 | 
|---|
| 202 | .ta +1.25i
 | 
|---|
| 203 | .ti -1.25i
 | 
|---|
| 204 | FS      Field separator (see below).
 | 
|---|
| 205 | .ti -1.25i
 | 
|---|
| 206 | RS      Record separator (see below also).
 | 
|---|
| 207 | .ti -1.25i
 | 
|---|
| 208 | NF      Number of fields in current input record (line).
 | 
|---|
| 209 | .ti -1.25i
 | 
|---|
| 210 | NR      Number of records processed thus far.
 | 
|---|
| 211 | .ti -1.25i
 | 
|---|
| 212 | FILENAME        Name of current input file.
 | 
|---|
| 213 | .ti -1.25i
 | 
|---|
| 214 | BEGIN   A special <pattern> that matches the beginning of input text.
 | 
|---|
| 215 | .ti -1.25i
 | 
|---|
| 216 | END     A special <pattern> that matches the end of input text.
 | 
|---|
| 217 | .in -1.5i
 | 
|---|
| 218 | .HS
 | 
|---|
| 219 | \fIAwk\fR also provides some useful built-in functions for string
 | 
|---|
| 220 | manipulation and printing:
 | 
|---|
| 221 | .HS
 | 
|---|
| 222 | .in +1.5i
 | 
|---|
| 223 | .ta +1.25i
 | 
|---|
| 224 | .ti -1.25i
 | 
|---|
| 225 | print(arg)      Simple printing of strings only, terminated by '\\n'.
 | 
|---|
| 226 | .ti -1.25i
 | 
|---|
| 227 | printf(arg...)  Exactly the printf() function from C.
 | 
|---|
| 228 | .ti -1.25i
 | 
|---|
| 229 | getline()       Reads the next record and returns 0 on end of file.
 | 
|---|
| 230 | .ti -1.25i
 | 
|---|
| 231 | nextfile()      Closes the current input file and begins processing the next file
 | 
|---|
| 232 | .ti -1.25i
 | 
|---|
| 233 | strlen(s)       Returns the length of its string argument.
 | 
|---|
| 234 | .ti -1.25i
 | 
|---|
| 235 | strcpy(s,t)     Copies the string \*(OQt\*(CQ to the string \*(OQs\*(CQ.
 | 
|---|
| 236 | .ti -1.25i
 | 
|---|
| 237 | strcmp(s,t)     Compares the \*(OQs\*(CQ to \*(OQt\*(CQ and returns 0 if they match.
 | 
|---|
| 238 | .ti -1.25i
 | 
|---|
| 239 | toupper(c)      Returns its character argument converted to upper-case.
 | 
|---|
| 240 | .ti -1.25i
 | 
|---|
| 241 | tolower(c)      Returns its character argument converted to lower-case.
 | 
|---|
| 242 | .ti -1.25i
 | 
|---|
| 243 | match(s,@re@)   Compares the string \*(OQs\*(CQ to the regular expression \*(OQre\*(CQ and 
 | 
|---|
| 244 | returns the number of matches found (zero if none).
 | 
|---|
| 245 | .in -1.5i
 | 
|---|
| 246 | .SS "Authors"
 | 
|---|
| 247 | .PP
 | 
|---|
| 248 | \fIAwk\fR was written by Saeko Hirabauashi and Kouichi Hirabayashi.
 | 
|---|