Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Normal
Revision Log

awk.9@ 15

Last change on this file since 15 was 9, checked in by Mattia Monga, 14 years ago
Minix 3.1.2a
File size: 8.4 KB

Rev	Line
[9]	1	.so mnx.mac
	2	.TH AWK 9
	3	.CD "awk \(en pattern matching language"
	4	.SX "awk \fIrules\fR [\fIfile\fR] ...
	5	.FL "\fR(none)"
	6	.EX "awk rules input" "Process \fIinput\fR according to \fIrules\fR"
	7	.EX "awk rules \(en >out" "Input from terminal, output to \fIout\fR"
	8	.PP
	9	AWK is a programming language devised by Aho, Weinberger, and Kernighan
	10	at Bell Labs (hence the name).
	11	\fIAwk\fR programs search files for
	12	specific patterns and performs \(OQactions\(CQ for every occurrence
	13	of these patterns. The patterns can be \(OQregular expressions\(CQ
	14	as used in the \fIed\fR editor. The actions are expressed
	15	using a subset of the C language.
	16	.PP
	17	The patterns and actions are usually placed in a \(OQrules\(CQ file
	18	whose name must be the first argument in the command line,
	19	preceded by the flag \fB\(enf\fR. Otherwise, the first argument on the
	20	command line is taken to be a string containing the rules
	21	themselves. All other arguments are taken to be the names of text
	22	files on which the rules are to be applied, with \fB\(en\fR being the
	23	standard input. To take rules from the standard input, use \fB\(enf \(en\fR.
	24	.PP
	25	The command:
	26	.HS
	27	.Cx "awk rules prog.\d\s+2*\s0\u"
	28	.HS
	29	would read the patterns and actions rules from the file \fIrules\fR
	30	and apply them to all the arguments.
	31	.PP
	32	The general format of a rules file is:
	33	.HS
	34	~~~<pattern> { <action> }
	35	~~~<pattern> { <action> }
	36	~~~...
	37	.HS
	38	There may be any number of these <pattern> { <action> }
	39	sequences in the rules file. \fIAwk\fR reads a line of input from
	40	the current input file and applies every <pattern> { <action> }
	41	in sequence to the line.
	42	.PP
	43	If the <pattern> corresponding to any { <action> } is missing,
	44	the action is applied to every line of input. The default
	45	{ <action> } is to print the matched input line.
	46	.SS "Patterns"
	47	.PP
	48	The <pattern>s may consist of any valid C expression. If the
	49	<pattern> consists of two expressions separated by a comma, it
	50	is taken to be a range and the <action> is performed on all
	51	lines of input that match the range. <pattern>s may contain
	52	\(OQregular expressions\(CQ delimited by an @ symbol. Regular
	53	expressions can be thought of as a generalized \(OQwildcard\(CQ
	54	string matching mechanism, similar to that used by many
	55	operating systems to specify file names. Regular expressions
	56	may contain any of the following characters:
	57	.HS
	58	.in +0.75i
	59	.ta +0.5i
	60	.ti -0.5i
	61	x An ordinary character
	62	.ti -0.5i
	63	\\ The backslash quotes any character
	64	.ti -0.5i
	65	^ A circumflex at the beginning of an expr matches the beginning of a line.
	66	.ti -0.5i
	67	$ A dollar-sign at the end of an expression matches the end of a line.
	68	.ti -0.5i
	69	\&. A period matches any single character except newline.
	70	.ti -0.5i
	71	* An expression followed by an asterisk matches zero or more occurrences
	72	of that expression: \(OQfo\(CQ matches \(OQf\(CQ, \(OQfo\(CQ, \(OQfoo\(CQ, \(OQfooo\*(CQ, etc.
	73	.ti -0.5i
	74	+ An expression followed by a plus sign matches one or more occurrences
	75	of that expression: \(OQfo+\(CQ matches \(OQfo\(CQ, \(OQfoo\(CQ, \(OQfooo\(CQ, etc.
	76	.ti -0.5i
	77	[] A string enclosed in square brackets matches any single character in that
	78	string, but no others. If the first character in the string is a circumflex, the
	79	expression matches any character except newline and the characters in the
	80	string. For example, \(OQ[xyz]\(CQ matches \(OQxx\(CQ and \(OQzyx\(CQ, while
	81	\(OQ[^xyz]\(CQ matches \(OQabc\(CQ but not \(OQaxb\(CQ. A range of characters may be
	82	specified by two characters separated by \(OQ-\(CQ.
	83	.in -0.75i
	84	.SS "Actions"
	85	.PP
	86	Actions are expressed as a subset of the C language. All
	87	variables are global and default to int's if not formally
	88	declared.
	89	Only char's and int's and pointers and arrays of
	90	char and int are allowed. \fIAwk\fR allows only decimal integer
	91	constants to be used\(emno hex (0xnn) or octal (0nn). String
	92	and character constants may contain all of the special C
	93	escapes (\\n, \\r, etc.).
	94	.PP
	95	\fIAwk\fR supports the \(OQif\(CQ, \(OQelse\(CQ,
	96	\(OQwhile\(CQ and \(OQbreak\(CQ flow of
	97	control constructs, which behave exactly as in C.
	98	.PP
	99	Also supported are the following unary and binary operators,
	100	listed in order from highest to lowest precedence:
	101	.HS
	102	.ta 0.25i 1.75i 3.0i
	103	.nf
	104	\fB Operator Type Associativity\fR
	105	() [] unary left to right
	106	.tr ~~
	107	! ~ ++ \(en\(en \(en * & unary right to left
	108	.tr ~
	109	* / % binary left to right
	110	+ \(en binary left to right
	111	<< >> binary left to right
	112	< <= > >= binary left to right
	113	== != binary left to right
	114	& binary left to right
	115	^ binary left to right
	116	\| binary left to right
	117	&& binary left to right
	118	\|\| binary left to right
	119	= binary right to left
	120	.fi
	121	.HS
	122	Comments are introduced by a '#' symbol and are terminated by
	123	the first newline character. The standard \(OQ/\(CQ and \(OQ/\(CQ
	124	comment delimiters are not supported and will result in a
	125	syntax error.
	126	.SP 0.5
	127	.SS "Fields"
	128	.SP 0.5
	129	.PP
	130	When \fIawk\fR reads a line from the current input file, the
	131	record is automatically separated into \(OQfields.\(CQ A field is
	132	simply a string of consecutive characters delimited by either
	133	the beginning or end of line, or a \(OQfield separator\(CQ character.
	134	Initially, the field separators are the space and tab character.
	135	The special unary operator '$' is used to reference one of the
	136	fields in the current input record (line). The fields are
	137	numbered sequentially starting at 1. The expression \(OQ$0\(CQ
	138	references the entire input line.
	139	.PP
	140	Similarly, the \(OQrecord separator\(CQ is used to determine the end
	141	of an input \(OQline,\(CQ initially the newline character. The field
	142	and record separators may be changed programatically by one of
	143	the actions and will remain in effect until changed again.
	144	.PP
	145	Multiple (up to 10) field separators are allowed at a time, but
	146	only one record separator.
	147	.PP
	148	Fields behave exactly like strings; and can be used in the same
	149	context as a character array. These \(OQarrays\(CQ can be considered
	150	to have been declared as:
	151	.SP 0.15
	152	.HS
	153	~~~~~char ($n)[ 128 ];
	154	.HS
	155	.SP 0.15
	156	In other words, they are 128 bytes long. Notice that the
	157	parentheses are necessary because the operators [] and $
	158	associate from right to left; without them, the statement
	159	would have parsed as:
	160	.HS
	161	.SP 0.15
	162	~~~~~char $(1[ 128 ]);
	163	.HS
	164	.SP 0.15
	165	which is obviously ridiculous.
	166	.PP
	167	If the contents of one of these field arrays is altered, the
	168	\(OQ$0\(CQ field will reflect this change. For example, this
	169	expression:
	170	.HS
	171	.SP 0.15
	172	~~~~~*$4 = 'A';
	173	.HS
	174	.SP 0.15
	175	will change the first character of the fourth field to an upper-
	176	case letter 'A'. Then, when the following input line:
	177	.HS
	178	.SP 0.15
	179	~~~~~120 PRINT "Name address Zip"
	180	.SP 0.15
	181	.HS
	182	is processed, it would be printed as:
	183	.HS
	184	.SP 0.15
	185	~~~~~120 PRINT "Name Address Zip"
	186	.HS
	187	.SP 0.15
	188	Fields may also be modified with the strcpy() function (see
	189	below). For example, the expression:
	190	.HS
	191	~~~~~strcpy( $4, "Addr." );
	192	.HS
	193	applied to the same line above would yield:
	194	.HS
	195	~~~~~120 PRINT "Name Addr. Zip"
	196	.HS
	197	.SS "Predefined Variables"
	198	.PP
	199	The following variables are pre-defined:
	200	.HS
	201	.in +1.5i
	202	.ta +1.25i
	203	.ti -1.25i
	204	FS Field separator (see below).
	205	.ti -1.25i
	206	RS Record separator (see below also).
	207	.ti -1.25i
	208	NF Number of fields in current input record (line).
	209	.ti -1.25i
	210	NR Number of records processed thus far.
	211	.ti -1.25i
	212	FILENAME Name of current input file.
	213	.ti -1.25i
	214	BEGIN A special <pattern> that matches the beginning of input text.
	215	.ti -1.25i
	216	END A special <pattern> that matches the end of input text.
	217	.in -1.5i
	218	.HS
	219	\fIAwk\fR also provides some useful built-in functions for string
	220	manipulation and printing:
	221	.HS
	222	.in +1.5i
	223	.ta +1.25i
	224	.ti -1.25i
	225	print(arg) Simple printing of strings only, terminated by '\\n'.
	226	.ti -1.25i
	227	printf(arg...) Exactly the printf() function from C.
	228	.ti -1.25i
	229	getline() Reads the next record and returns 0 on end of file.
	230	.ti -1.25i
	231	nextfile() Closes the current input file and begins processing the next file
	232	.ti -1.25i
	233	strlen(s) Returns the length of its string argument.
	234	.ti -1.25i
	235	strcpy(s,t) Copies the string \(OQt\(CQ to the string \(OQs\(CQ.
	236	.ti -1.25i
	237	strcmp(s,t) Compares the \(OQs\(CQ to \(OQt\(CQ and returns 0 if they match.
	238	.ti -1.25i
	239	toupper(c) Returns its character argument converted to upper-case.
	240	.ti -1.25i
	241	tolower(c) Returns its character argument converted to lower-case.
	242	.ti -1.25i
	243	match(s,@re@) Compares the string \(OQs\(CQ to the regular expression \(OQre\(CQ and
	244	returns the number of matches found (zero if none).
	245	.in -1.5i
	246	.SS "Authors"
	247	.PP
	248	\fIAwk\fR was written by Saeko Hirabauashi and Kouichi Hirabayashi.

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format