source: trunk/minix/man/man9/awk.9@ 15

Last change on this file since 15 was 9, checked in by Mattia Monga, 14 years ago

Minix 3.1.2a

File size: 8.4 KB
RevLine 
[9]1.so mnx.mac
2.TH AWK 9
3.CD "awk \(en pattern matching language"
4.SX "awk \fIrules\fR [\fIfile\fR] ...
5.FL "\fR(none)"
6.EX "awk rules input" "Process \fIinput\fR according to \fIrules\fR"
7.EX "awk rules \(en >out" "Input from terminal, output to \fIout\fR"
8.PP
9AWK is a programming language devised by Aho, Weinberger, and Kernighan
10at Bell Labs (hence the name).
11\fIAwk\fR programs search files for
12specific patterns and performs \*(OQactions\*(CQ for every occurrence
13of these patterns. The patterns can be \*(OQregular expressions\*(CQ
14as used in the \fIed\fR editor. The actions are expressed
15using a subset of the C language.
16.PP
17The patterns and actions are usually placed in a \*(OQrules\*(CQ file
18whose name must be the first argument in the command line,
19preceded by the flag \fB\(enf\fR. Otherwise, the first argument on the
20command line is taken to be a string containing the rules
21themselves. All other arguments are taken to be the names of text
22files on which the rules are to be applied, with \fB\(en\fR being the
23standard input. To take rules from the standard input, use \fB\(enf \(en\fR.
24.PP
25The command:
26.HS
27.Cx "awk rules prog.\d\s+2*\s0\u"
28.HS
29would read the patterns and actions rules from the file \fIrules\fR
30and apply them to all the arguments.
31.PP
32The general format of a rules file is:
33.HS
34~~~<pattern> { <action> }
35~~~<pattern> { <action> }
36~~~...
37.HS
38There may be any number of these <pattern> { <action> }
39sequences in the rules file. \fIAwk\fR reads a line of input from
40the current input file and applies every <pattern> { <action> }
41in sequence to the line.
42.PP
43If the <pattern> corresponding to any { <action> } is missing,
44the action is applied to every line of input. The default
45{ <action> } is to print the matched input line.
46.SS "Patterns"
47.PP
48The <pattern>s may consist of any valid C expression. If the
49<pattern> consists of two expressions separated by a comma, it
50is taken to be a range and the <action> is performed on all
51lines of input that match the range. <pattern>s may contain
52\*(OQregular expressions\*(CQ delimited by an @ symbol. Regular
53expressions can be thought of as a generalized \*(OQwildcard\*(CQ
54string matching mechanism, similar to that used by many
55operating systems to specify file names. Regular expressions
56may contain any of the following characters:
57.HS
58.in +0.75i
59.ta +0.5i
60.ti -0.5i
61x An ordinary character
62.ti -0.5i
63\\ The backslash quotes any character
64.ti -0.5i
65^ A circumflex at the beginning of an expr matches the beginning of a line.
66.ti -0.5i
67$ A dollar-sign at the end of an expression matches the end of a line.
68.ti -0.5i
69\&. A period matches any single character except newline.
70.ti -0.5i
71* An expression followed by an asterisk matches zero or more occurrences
72of that expression: \*(OQfo*\*(CQ matches \*(OQf\*(CQ, \*(OQfo\*(CQ, \*(OQfoo\*(CQ, \*(OQfooo\*(CQ, etc.
73.ti -0.5i
74+ An expression followed by a plus sign matches one or more occurrences
75of that expression: \*(OQfo+\*(CQ matches \*(OQfo\*(CQ, \*(OQfoo\*(CQ, \*(OQfooo\*(CQ, etc.
76.ti -0.5i
77[] A string enclosed in square brackets matches any single character in that
78string, but no others. If the first character in the string is a circumflex, the
79expression matches any character except newline and the characters in the
80string. For example, \*(OQ[xyz]\*(CQ matches \*(OQxx\*(CQ and \*(OQzyx\*(CQ, while
81\*(OQ[^xyz]\*(CQ matches \*(OQabc\*(CQ but not \*(OQaxb\*(CQ. A range of characters may be
82specified by two characters separated by \*(OQ-\*(CQ.
83.in -0.75i
84.SS "Actions"
85.PP
86Actions are expressed as a subset of the C language. All
87variables are global and default to int's if not formally
88declared.
89Only char's and int's and pointers and arrays of
90char and int are allowed. \fIAwk\fR allows only decimal integer
91constants to be used\(emno hex (0xnn) or octal (0nn). String
92and character constants may contain all of the special C
93escapes (\\n, \\r, etc.).
94.PP
95\fIAwk\fR supports the \*(OQif\*(CQ, \*(OQelse\*(CQ,
96\*(OQwhile\*(CQ and \*(OQbreak\*(CQ flow of
97control constructs, which behave exactly as in C.
98.PP
99Also supported are the following unary and binary operators,
100listed in order from highest to lowest precedence:
101.HS
102.ta 0.25i 1.75i 3.0i
103.nf
104\fB Operator Type Associativity\fR
105 () [] unary left to right
106.tr ~~
107 ! ~ ++ \(en\(en \(en * & unary right to left
108.tr ~
109 * / % binary left to right
110 + \(en binary left to right
111 << >> binary left to right
112 < <= > >= binary left to right
113 == != binary left to right
114 & binary left to right
115 ^ binary left to right
116 | binary left to right
117 && binary left to right
118 || binary left to right
119 = binary right to left
120.fi
121.HS
122Comments are introduced by a '#' symbol and are terminated by
123the first newline character. The standard \*(OQ/*\*(CQ and \*(OQ*/\*(CQ
124comment delimiters are not supported and will result in a
125syntax error.
126.SP 0.5
127.SS "Fields"
128.SP 0.5
129.PP
130When \fIawk\fR reads a line from the current input file, the
131record is automatically separated into \*(OQfields.\*(CQ A field is
132simply a string of consecutive characters delimited by either
133the beginning or end of line, or a \*(OQfield separator\*(CQ character.
134Initially, the field separators are the space and tab character.
135The special unary operator '$' is used to reference one of the
136fields in the current input record (line). The fields are
137numbered sequentially starting at 1. The expression \*(OQ$0\*(CQ
138references the entire input line.
139.PP
140Similarly, the \*(OQrecord separator\*(CQ is used to determine the end
141of an input \*(OQline,\*(CQ initially the newline character. The field
142and record separators may be changed programatically by one of
143the actions and will remain in effect until changed again.
144.PP
145Multiple (up to 10) field separators are allowed at a time, but
146only one record separator.
147.PP
148Fields behave exactly like strings; and can be used in the same
149context as a character array. These \*(OQarrays\*(CQ can be considered
150to have been declared as:
151.SP 0.15
152.HS
153~~~~~char ($n)[ 128 ];
154.HS
155.SP 0.15
156In other words, they are 128 bytes long. Notice that the
157parentheses are necessary because the operators [] and $
158associate from right to left; without them, the statement
159would have parsed as:
160.HS
161.SP 0.15
162~~~~~char $(1[ 128 ]);
163.HS
164.SP 0.15
165which is obviously ridiculous.
166.PP
167If the contents of one of these field arrays is altered, the
168\*(OQ$0\*(CQ field will reflect this change. For example, this
169expression:
170.HS
171.SP 0.15
172~~~~~*$4 = 'A';
173.HS
174.SP 0.15
175will change the first character of the fourth field to an upper-
176case letter 'A'. Then, when the following input line:
177.HS
178.SP 0.15
179~~~~~120 PRINT "Name address Zip"
180.SP 0.15
181.HS
182is processed, it would be printed as:
183.HS
184.SP 0.15
185~~~~~120 PRINT "Name Address Zip"
186.HS
187.SP 0.15
188Fields may also be modified with the strcpy() function (see
189below). For example, the expression:
190.HS
191~~~~~strcpy( $4, "Addr." );
192.HS
193applied to the same line above would yield:
194.HS
195~~~~~120 PRINT "Name Addr. Zip"
196.HS
197.SS "Predefined Variables"
198.PP
199The following variables are pre-defined:
200.HS
201.in +1.5i
202.ta +1.25i
203.ti -1.25i
204FS Field separator (see below).
205.ti -1.25i
206RS Record separator (see below also).
207.ti -1.25i
208NF Number of fields in current input record (line).
209.ti -1.25i
210NR Number of records processed thus far.
211.ti -1.25i
212FILENAME Name of current input file.
213.ti -1.25i
214BEGIN A special <pattern> that matches the beginning of input text.
215.ti -1.25i
216END A special <pattern> that matches the end of input text.
217.in -1.5i
218.HS
219\fIAwk\fR also provides some useful built-in functions for string
220manipulation and printing:
221.HS
222.in +1.5i
223.ta +1.25i
224.ti -1.25i
225print(arg) Simple printing of strings only, terminated by '\\n'.
226.ti -1.25i
227printf(arg...) Exactly the printf() function from C.
228.ti -1.25i
229getline() Reads the next record and returns 0 on end of file.
230.ti -1.25i
231nextfile() Closes the current input file and begins processing the next file
232.ti -1.25i
233strlen(s) Returns the length of its string argument.
234.ti -1.25i
235strcpy(s,t) Copies the string \*(OQt\*(CQ to the string \*(OQs\*(CQ.
236.ti -1.25i
237strcmp(s,t) Compares the \*(OQs\*(CQ to \*(OQt\*(CQ and returns 0 if they match.
238.ti -1.25i
239toupper(c) Returns its character argument converted to upper-case.
240.ti -1.25i
241tolower(c) Returns its character argument converted to lower-case.
242.ti -1.25i
243match(s,@re@) Compares the string \*(OQs\*(CQ to the regular expression \*(OQre\*(CQ and
244returns the number of matches found (zero if none).
245.in -1.5i
246.SS "Authors"
247.PP
248\fIAwk\fR was written by Saeko Hirabauashi and Kouichi Hirabayashi.
Note: See TracBrowser for help on using the repository browser.