1 | .so mnx.mac
|
---|
2 | .TH AWK 9
|
---|
3 | .CD "awk \(en pattern matching language"
|
---|
4 | .SX "awk \fIrules\fR [\fIfile\fR] ...
|
---|
5 | .FL "\fR(none)"
|
---|
6 | .EX "awk rules input" "Process \fIinput\fR according to \fIrules\fR"
|
---|
7 | .EX "awk rules \(en >out" "Input from terminal, output to \fIout\fR"
|
---|
8 | .PP
|
---|
9 | AWK is a programming language devised by Aho, Weinberger, and Kernighan
|
---|
10 | at Bell Labs (hence the name).
|
---|
11 | \fIAwk\fR programs search files for
|
---|
12 | specific patterns and performs \*(OQactions\*(CQ for every occurrence
|
---|
13 | of these patterns. The patterns can be \*(OQregular expressions\*(CQ
|
---|
14 | as used in the \fIed\fR editor. The actions are expressed
|
---|
15 | using a subset of the C language.
|
---|
16 | .PP
|
---|
17 | The patterns and actions are usually placed in a \*(OQrules\*(CQ file
|
---|
18 | whose name must be the first argument in the command line,
|
---|
19 | preceded by the flag \fB\(enf\fR. Otherwise, the first argument on the
|
---|
20 | command line is taken to be a string containing the rules
|
---|
21 | themselves. All other arguments are taken to be the names of text
|
---|
22 | files on which the rules are to be applied, with \fB\(en\fR being the
|
---|
23 | standard input. To take rules from the standard input, use \fB\(enf \(en\fR.
|
---|
24 | .PP
|
---|
25 | The command:
|
---|
26 | .HS
|
---|
27 | .Cx "awk rules prog.\d\s+2*\s0\u"
|
---|
28 | .HS
|
---|
29 | would read the patterns and actions rules from the file \fIrules\fR
|
---|
30 | and apply them to all the arguments.
|
---|
31 | .PP
|
---|
32 | The general format of a rules file is:
|
---|
33 | .HS
|
---|
34 | ~~~<pattern> { <action> }
|
---|
35 | ~~~<pattern> { <action> }
|
---|
36 | ~~~...
|
---|
37 | .HS
|
---|
38 | There may be any number of these <pattern> { <action> }
|
---|
39 | sequences in the rules file. \fIAwk\fR reads a line of input from
|
---|
40 | the current input file and applies every <pattern> { <action> }
|
---|
41 | in sequence to the line.
|
---|
42 | .PP
|
---|
43 | If the <pattern> corresponding to any { <action> } is missing,
|
---|
44 | the action is applied to every line of input. The default
|
---|
45 | { <action> } is to print the matched input line.
|
---|
46 | .SS "Patterns"
|
---|
47 | .PP
|
---|
48 | The <pattern>s may consist of any valid C expression. If the
|
---|
49 | <pattern> consists of two expressions separated by a comma, it
|
---|
50 | is taken to be a range and the <action> is performed on all
|
---|
51 | lines of input that match the range. <pattern>s may contain
|
---|
52 | \*(OQregular expressions\*(CQ delimited by an @ symbol. Regular
|
---|
53 | expressions can be thought of as a generalized \*(OQwildcard\*(CQ
|
---|
54 | string matching mechanism, similar to that used by many
|
---|
55 | operating systems to specify file names. Regular expressions
|
---|
56 | may contain any of the following characters:
|
---|
57 | .HS
|
---|
58 | .in +0.75i
|
---|
59 | .ta +0.5i
|
---|
60 | .ti -0.5i
|
---|
61 | x An ordinary character
|
---|
62 | .ti -0.5i
|
---|
63 | \\ The backslash quotes any character
|
---|
64 | .ti -0.5i
|
---|
65 | ^ A circumflex at the beginning of an expr matches the beginning of a line.
|
---|
66 | .ti -0.5i
|
---|
67 | $ A dollar-sign at the end of an expression matches the end of a line.
|
---|
68 | .ti -0.5i
|
---|
69 | \&. A period matches any single character except newline.
|
---|
70 | .ti -0.5i
|
---|
71 | * An expression followed by an asterisk matches zero or more occurrences
|
---|
72 | of that expression: \*(OQfo*\*(CQ matches \*(OQf\*(CQ, \*(OQfo\*(CQ, \*(OQfoo\*(CQ, \*(OQfooo\*(CQ, etc.
|
---|
73 | .ti -0.5i
|
---|
74 | + An expression followed by a plus sign matches one or more occurrences
|
---|
75 | of that expression: \*(OQfo+\*(CQ matches \*(OQfo\*(CQ, \*(OQfoo\*(CQ, \*(OQfooo\*(CQ, etc.
|
---|
76 | .ti -0.5i
|
---|
77 | [] A string enclosed in square brackets matches any single character in that
|
---|
78 | string, but no others. If the first character in the string is a circumflex, the
|
---|
79 | expression matches any character except newline and the characters in the
|
---|
80 | string. For example, \*(OQ[xyz]\*(CQ matches \*(OQxx\*(CQ and \*(OQzyx\*(CQ, while
|
---|
81 | \*(OQ[^xyz]\*(CQ matches \*(OQabc\*(CQ but not \*(OQaxb\*(CQ. A range of characters may be
|
---|
82 | specified by two characters separated by \*(OQ-\*(CQ.
|
---|
83 | .in -0.75i
|
---|
84 | .SS "Actions"
|
---|
85 | .PP
|
---|
86 | Actions are expressed as a subset of the C language. All
|
---|
87 | variables are global and default to int's if not formally
|
---|
88 | declared.
|
---|
89 | Only char's and int's and pointers and arrays of
|
---|
90 | char and int are allowed. \fIAwk\fR allows only decimal integer
|
---|
91 | constants to be used\(emno hex (0xnn) or octal (0nn). String
|
---|
92 | and character constants may contain all of the special C
|
---|
93 | escapes (\\n, \\r, etc.).
|
---|
94 | .PP
|
---|
95 | \fIAwk\fR supports the \*(OQif\*(CQ, \*(OQelse\*(CQ,
|
---|
96 | \*(OQwhile\*(CQ and \*(OQbreak\*(CQ flow of
|
---|
97 | control constructs, which behave exactly as in C.
|
---|
98 | .PP
|
---|
99 | Also supported are the following unary and binary operators,
|
---|
100 | listed in order from highest to lowest precedence:
|
---|
101 | .HS
|
---|
102 | .ta 0.25i 1.75i 3.0i
|
---|
103 | .nf
|
---|
104 | \fB Operator Type Associativity\fR
|
---|
105 | () [] unary left to right
|
---|
106 | .tr ~~
|
---|
107 | ! ~ ++ \(en\(en \(en * & unary right to left
|
---|
108 | .tr ~
|
---|
109 | * / % binary left to right
|
---|
110 | + \(en binary left to right
|
---|
111 | << >> binary left to right
|
---|
112 | < <= > >= binary left to right
|
---|
113 | == != binary left to right
|
---|
114 | & binary left to right
|
---|
115 | ^ binary left to right
|
---|
116 | | binary left to right
|
---|
117 | && binary left to right
|
---|
118 | || binary left to right
|
---|
119 | = binary right to left
|
---|
120 | .fi
|
---|
121 | .HS
|
---|
122 | Comments are introduced by a '#' symbol and are terminated by
|
---|
123 | the first newline character. The standard \*(OQ/*\*(CQ and \*(OQ*/\*(CQ
|
---|
124 | comment delimiters are not supported and will result in a
|
---|
125 | syntax error.
|
---|
126 | .SP 0.5
|
---|
127 | .SS "Fields"
|
---|
128 | .SP 0.5
|
---|
129 | .PP
|
---|
130 | When \fIawk\fR reads a line from the current input file, the
|
---|
131 | record is automatically separated into \*(OQfields.\*(CQ A field is
|
---|
132 | simply a string of consecutive characters delimited by either
|
---|
133 | the beginning or end of line, or a \*(OQfield separator\*(CQ character.
|
---|
134 | Initially, the field separators are the space and tab character.
|
---|
135 | The special unary operator '$' is used to reference one of the
|
---|
136 | fields in the current input record (line). The fields are
|
---|
137 | numbered sequentially starting at 1. The expression \*(OQ$0\*(CQ
|
---|
138 | references the entire input line.
|
---|
139 | .PP
|
---|
140 | Similarly, the \*(OQrecord separator\*(CQ is used to determine the end
|
---|
141 | of an input \*(OQline,\*(CQ initially the newline character. The field
|
---|
142 | and record separators may be changed programatically by one of
|
---|
143 | the actions and will remain in effect until changed again.
|
---|
144 | .PP
|
---|
145 | Multiple (up to 10) field separators are allowed at a time, but
|
---|
146 | only one record separator.
|
---|
147 | .PP
|
---|
148 | Fields behave exactly like strings; and can be used in the same
|
---|
149 | context as a character array. These \*(OQarrays\*(CQ can be considered
|
---|
150 | to have been declared as:
|
---|
151 | .SP 0.15
|
---|
152 | .HS
|
---|
153 | ~~~~~char ($n)[ 128 ];
|
---|
154 | .HS
|
---|
155 | .SP 0.15
|
---|
156 | In other words, they are 128 bytes long. Notice that the
|
---|
157 | parentheses are necessary because the operators [] and $
|
---|
158 | associate from right to left; without them, the statement
|
---|
159 | would have parsed as:
|
---|
160 | .HS
|
---|
161 | .SP 0.15
|
---|
162 | ~~~~~char $(1[ 128 ]);
|
---|
163 | .HS
|
---|
164 | .SP 0.15
|
---|
165 | which is obviously ridiculous.
|
---|
166 | .PP
|
---|
167 | If the contents of one of these field arrays is altered, the
|
---|
168 | \*(OQ$0\*(CQ field will reflect this change. For example, this
|
---|
169 | expression:
|
---|
170 | .HS
|
---|
171 | .SP 0.15
|
---|
172 | ~~~~~*$4 = 'A';
|
---|
173 | .HS
|
---|
174 | .SP 0.15
|
---|
175 | will change the first character of the fourth field to an upper-
|
---|
176 | case letter 'A'. Then, when the following input line:
|
---|
177 | .HS
|
---|
178 | .SP 0.15
|
---|
179 | ~~~~~120 PRINT "Name address Zip"
|
---|
180 | .SP 0.15
|
---|
181 | .HS
|
---|
182 | is processed, it would be printed as:
|
---|
183 | .HS
|
---|
184 | .SP 0.15
|
---|
185 | ~~~~~120 PRINT "Name Address Zip"
|
---|
186 | .HS
|
---|
187 | .SP 0.15
|
---|
188 | Fields may also be modified with the strcpy() function (see
|
---|
189 | below). For example, the expression:
|
---|
190 | .HS
|
---|
191 | ~~~~~strcpy( $4, "Addr." );
|
---|
192 | .HS
|
---|
193 | applied to the same line above would yield:
|
---|
194 | .HS
|
---|
195 | ~~~~~120 PRINT "Name Addr. Zip"
|
---|
196 | .HS
|
---|
197 | .SS "Predefined Variables"
|
---|
198 | .PP
|
---|
199 | The following variables are pre-defined:
|
---|
200 | .HS
|
---|
201 | .in +1.5i
|
---|
202 | .ta +1.25i
|
---|
203 | .ti -1.25i
|
---|
204 | FS Field separator (see below).
|
---|
205 | .ti -1.25i
|
---|
206 | RS Record separator (see below also).
|
---|
207 | .ti -1.25i
|
---|
208 | NF Number of fields in current input record (line).
|
---|
209 | .ti -1.25i
|
---|
210 | NR Number of records processed thus far.
|
---|
211 | .ti -1.25i
|
---|
212 | FILENAME Name of current input file.
|
---|
213 | .ti -1.25i
|
---|
214 | BEGIN A special <pattern> that matches the beginning of input text.
|
---|
215 | .ti -1.25i
|
---|
216 | END A special <pattern> that matches the end of input text.
|
---|
217 | .in -1.5i
|
---|
218 | .HS
|
---|
219 | \fIAwk\fR also provides some useful built-in functions for string
|
---|
220 | manipulation and printing:
|
---|
221 | .HS
|
---|
222 | .in +1.5i
|
---|
223 | .ta +1.25i
|
---|
224 | .ti -1.25i
|
---|
225 | print(arg) Simple printing of strings only, terminated by '\\n'.
|
---|
226 | .ti -1.25i
|
---|
227 | printf(arg...) Exactly the printf() function from C.
|
---|
228 | .ti -1.25i
|
---|
229 | getline() Reads the next record and returns 0 on end of file.
|
---|
230 | .ti -1.25i
|
---|
231 | nextfile() Closes the current input file and begins processing the next file
|
---|
232 | .ti -1.25i
|
---|
233 | strlen(s) Returns the length of its string argument.
|
---|
234 | .ti -1.25i
|
---|
235 | strcpy(s,t) Copies the string \*(OQt\*(CQ to the string \*(OQs\*(CQ.
|
---|
236 | .ti -1.25i
|
---|
237 | strcmp(s,t) Compares the \*(OQs\*(CQ to \*(OQt\*(CQ and returns 0 if they match.
|
---|
238 | .ti -1.25i
|
---|
239 | toupper(c) Returns its character argument converted to upper-case.
|
---|
240 | .ti -1.25i
|
---|
241 | tolower(c) Returns its character argument converted to lower-case.
|
---|
242 | .ti -1.25i
|
---|
243 | match(s,@re@) Compares the string \*(OQs\*(CQ to the regular expression \*(OQre\*(CQ and
|
---|
244 | returns the number of matches found (zero if none).
|
---|
245 | .in -1.5i
|
---|
246 | .SS "Authors"
|
---|
247 | .PP
|
---|
248 | \fIAwk\fR was written by Saeko Hirabauashi and Kouichi Hirabayashi.
|
---|