Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Normal
Revision Log

flexdoc.1@ 15

Last change on this file since 15 was 9, checked in by Mattia Monga, 14 years ago
Minix 3.1.2a
File size: 63.9 KB

Rev	Line
[9]	1	.TH FLEX 1 "26 May 1990" "Version 2.3"
	2	.SH NAME
	3	flexdoc - fast lexical analyzer generator
	4	.SH SYNOPSIS
	5	.B flex
	6	.B [-bcdfinpstvFILT8 -C[efmF] -Sskeleton]
	7	.I [filename ...]
	8	.SH DESCRIPTION
	9	.I flex
	10	is a tool for generating
	11	.I scanners:
	12	programs which recognized lexical patterns in text.
	13	.I flex
	14	reads
	15	the given input files, or its standard input if no file names are given,
	16	for a description of a scanner to generate. The description is in
	17	the form of pairs
	18	of regular expressions and C code, called
	19	.I rules. flex
	20	generates as output a C source file,
	21	.B lex.yy.c,
	22	which defines a routine
	23	.B yylex().
	24	This file is compiled and linked with the
	25	.B -lfl
	26	library to produce an executable. When the executable is run,
	27	it analyzes its input for occurrences
	28	of the regular expressions. Whenever it finds one, it executes
	29	the corresponding C code.
	30	.SH SOME SIMPLE EXAMPLES
	31	.LP
	32	First some simple examples to get the flavor of how one uses
	33	.I flex.
	34	The following
	35	.I flex
	36	input specifies a scanner which whenever it encounters the string
	37	"username" will replace it with the user's login name:
	38	.nf
	39
	40	%%
	41	username printf( "%s", getlogin() );
	42
	43	.fi
	44	By default, any text not matched by a
	45	.I flex
	46	scanner
	47	is copied to the output, so the net effect of this scanner is
	48	to copy its input file to its output with each occurrence
	49	of "username" expanded.
	50	In this input, there is just one rule. "username" is the
	51	.I pattern
	52	and the "printf" is the
	53	.I action.
	54	The "%%" marks the beginning of the rules.
	55	.LP
	56	Here's another simple example:
	57	.nf
	58
	59	int num_lines = 0, num_chars = 0;
	60
	61	%%
	62	\\n ++num_lines; ++num_chars;
	63	. ++num_chars;
	64
	65	%%
	66	main()
	67	{
	68	yylex();
	69	printf( "# of lines = %d, # of chars = %d\\n",
	70	num_lines, num_chars );
	71	}
	72
	73	.fi
	74	This scanner counts the number of characters and the number
	75	of lines in its input (it produces no output other than the
	76	final report on the counts). The first line
	77	declares two globals, "num_lines" and "num_chars", which are accessible
	78	both inside
	79	.B yylex()
	80	and in the
	81	.B main()
	82	routine declared after the second "%%". There are two rules, one
	83	which matches a newline ("\\n") and increments both the line count and
	84	the character count, and one which matches any character other than
	85	a newline (indicated by the "." regular expression).
	86	.LP
	87	A somewhat more complicated example:
	88	.nf
	89
	90	/* scanner for a toy Pascal-like language */
	91
	92	%{
	93	/* need this for the call to atof() below */
	94	#include <math.h>
	95	%}
	96
	97	DIGIT [0-9]
	98	ID [a-z][a-z0-9]*
	99
	100	%%
	101
	102	{DIGIT}+ {
	103	printf( "An integer: %s (%d)\\n", yytext,
	104	atoi( yytext ) );
	105	}
	106
	107	{DIGIT}+"."{DIGIT}* {
	108	printf( "A float: %s (%g)\\n", yytext,
	109	atof( yytext ) );
	110	}
	111
	112	if\|then\|begin\|end\|procedure\|function {
	113	printf( "A keyword: %s\\n", yytext );
	114	}
	115
	116	{ID} printf( "An identifier: %s\\n", yytext );
	117
	118	"+"\|"-"\|"*"\|"/" printf( "An operator: %s\\n", yytext );
	119
	120	"{"[^}\\n]"}" / eat up one-line comments */
	121
	122	[ \\t\\n]+ /* eat up whitespace */
	123
	124	. printf( "Unrecognized character: %s\\n", yytext );
	125
	126	%%
	127
	128	main( argc, argv )
	129	int argc;
	130	char **argv;
	131	{
	132	++argv, --argc; /* skip over program name */
	133	if ( argc > 0 )
	134	yyin = fopen( argv[0], "r" );
	135	else
	136	yyin = stdin;
	137
	138	yylex();
	139	}
	140
	141	.fi
	142	This is the beginnings of a simple scanner for a language like
	143	Pascal. It identifies different types of
	144	.I tokens
	145	and reports on what it has seen.
	146	.LP
	147	The details of this example will be explained in the following
	148	sections.
	149	.SH FORMAT OF THE INPUT FILE
	150	The
	151	.I flex
	152	input file consists of three sections, separated by a line with just
	153	.B %%
	154	in it:
	155	.nf
	156
	157	definitions
	158	%%
	159	rules
	160	%%
	161	user code
	162
	163	.fi
	164	The
	165	.I definitions
	166	section contains declarations of simple
	167	.I name
	168	definitions to simplify the scanner specification, and declarations of
	169	.I start conditions,
	170	which are explained in a later section.
	171	.LP
	172	Name definitions have the form:
	173	.nf
	174
	175	name definition
	176
	177	.fi
	178	The "name" is a word beginning with a letter or an underscore ('_')
	179	followed by zero or more letters, digits, '_', or '-' (dash).
	180	The definition is taken to begin at the first non-white-space character
	181	following the name and continuing to the end of the line.
	182	The definition can subsequently be referred to using "{name}", which
	183	will expand to "(definition)". For example,
	184	.nf
	185
	186	DIGIT [0-9]
	187	ID [a-z][a-z0-9]*
	188
	189	.fi
	190	defines "DIGIT" to be a regular expression which matches a
	191	single digit, and
	192	"ID" to be a regular expression which matches a letter
	193	followed by zero-or-more letters-or-digits.
	194	A subsequent reference to
	195	.nf
	196
	197	{DIGIT}+"."{DIGIT}*
	198
	199	.fi
	200	is identical to
	201	.nf
	202
	203	([0-9])+"."([0-9])*
	204
	205	.fi
	206	and matches one-or-more digits followed by a '.' followed
	207	by zero-or-more digits.
	208	.LP
	209	The
	210	.I rules
	211	section of the
	212	.I flex
	213	input contains a series of rules of the form:
	214	.nf
	215
	216	pattern action
	217
	218	.fi
	219	where the pattern must be unindented and the action must begin
	220	on the same line.
	221	.LP
	222	See below for a further description of patterns and actions.
	223	.LP
	224	Finally, the user code section is simply copied to
	225	.B lex.yy.c
	226	verbatim.
	227	It is used for companion routines which call or are called
	228	by the scanner. The presence of this section is optional;
	229	if it is missing, the second
	230	.B %%
	231	in the input file may be skipped, too.
	232	.LP
	233	In the definitions and rules sections, any
	234	.I indented
	235	text or text enclosed in
	236	.B %{
	237	and
	238	.B %}
	239	is copied verbatim to the output (with the %{}'s removed).
	240	The %{}'s must appear unindented on lines by themselves.
	241	.LP
	242	In the rules section,
	243	any indented or %{} text appearing before the
	244	first rule may be used to declare variables
	245	which are local to the scanning routine and (after the declarations)
	246	code which is to be executed whenever the scanning routine is entered.
	247	Other indented or %{} text in the rule section is still copied to the output,
	248	but its meaning is not well-defined and it may well cause compile-time
	249	errors (this feature is present for
	250	.I POSIX
	251	compliance; see below for other such features).
	252	.LP
	253	In the definitions section, an unindented comment (i.e., a line
	254	beginning with "/*") is also copied verbatim to the output up
	255	to the next "*/". Also, any line in the definitions section
	256	beginning with '#' is ignored, though this style of comment is
	257	deprecated and may go away in the future.
	258	.SH PATTERNS
	259	The patterns in the input are written using an extended set of regular
	260	expressions. These are:
	261	.nf
	262
	263	x match the character 'x'
	264	. any character except newline
	265	[xyz] a "character class"; in this case, the pattern
	266	matches either an 'x', a 'y', or a 'z'
	267	[abj-oZ] a "character class" with a range in it; matches
	268	an 'a', a 'b', any letter from 'j' through 'o',
	269	or a 'Z'
	270	[^A-Z] a "negated character class", i.e., any character
	271	but those in the class. In this case, any
	272	character EXCEPT an uppercase letter.
	273	[^A-Z\\n] any character EXCEPT an uppercase letter or
	274	a newline
	275	r* zero or more r's, where r is any regular expression
	276	r+ one or more r's
	277	r? zero or one r's (that is, "an optional r")
	278	r{2,5} anywhere from two to five r's
	279	r{2,} two or more r's
	280	r{4} exactly 4 r's
	281	{name} the expansion of the "name" definition
	282	(see above)
	283	"[xyz]\\"foo"
	284	the literal string: [xyz]"foo
	285	\\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
	286	then the ANSI-C interpretation of \\x.
	287	Otherwise, a literal 'X' (used to escape
	288	operators such as '*')
	289	\\123 the character with octal value 123
	290	\\x2a the character with hexadecimal value 2a
	291	(r) match an r; parentheses are used to override
	292	precedence (see below)
	293
	294
	295	rs the regular expression r followed by the
	296	regular expression s; called "concatenation"
	297
	298
	299	r\|s either an r or an s
	300
	301
	302	r/s an r but only if it is followed by an s. The
	303	s is not part of the matched text. This type
	304	of pattern is called as "trailing context".
	305	^r an r, but only at the beginning of a line
	306	r$ an r, but only at the end of a line. Equivalent
	307	to "r/\\n".
	308
	309
	310	<s>r an r, but only in start condition s (see
	311	below for discussion of start conditions)
	312	<s1,s2,s3>r
	313	same, but in any of start conditions s1,
	314	s2, or s3
	315
	316
	317	<<EOF>> an end-of-file
	318	<s1,s2><<EOF>>
	319	an end-of-file when in start condition s1 or s2
	320
	321	.fi
	322	The regular expressions listed above are grouped according to
	323	precedence, from highest precedence at the top to lowest at the bottom.
	324	Those grouped together have equal precedence. For example,
	325	.nf
	326
	327	foo\|bar*
	328
	329	.fi
	330	is the same as
	331	.nf
	332
	333	(foo)\|(ba(r*))
	334
	335	.fi
	336	since the '*' operator has higher precedence than concatenation,
	337	and concatenation higher than alternation ('\|'). This pattern
	338	therefore matches
	339	.I either
	340	the string "foo"
	341	.I or
	342	the string "ba" followed by zero-or-more r's.
	343	To match "foo" or zero-or-more "bar"'s, use:
	344	.nf
	345
	346	foo\|(bar)*
	347
	348	.fi
	349	and to match zero-or-more "foo"'s-or-"bar"'s:
	350	.nf
	351
	352	(foo\|bar)*
	353
	354	.fi
	355	.LP
	356	Some notes on patterns:
	357	.IP -
	358	A negated character class such as the example "[^A-Z]"
	359	above
	360	.I will match a newline
	361	unless "\\n" (or an equivalent escape sequence) is one of the
	362	characters explicitly present in the negated character class
	363	(e.g., "[^A-Z\\n]"). This is unlike how many other regular
	364	expression tools treat negated character classes, but unfortunately
	365	the inconsistency is historically entrenched.
	366	Matching newlines means that a pattern like [^"]* can match an entire
	367	input (overflowing the scanner's input buffer) unless there's another
	368	quote in the input.
	369	.IP -
	370	A rule can have at most one instance of trailing context (the '/' operator
	371	or the '$' operator). The start condition, '^', and "<<EOF>>" patterns
	372	can only occur at the beginning of a pattern, and, as well as with '/' and '$',
	373	cannot be grouped inside parentheses. A '^' which does not occur at
	374	the beginning of a rule or a '$' which does not occur at the end of
	375	a rule loses its special properties and is treated as a normal character.
	376	.IP
	377	The following are illegal:
	378	.nf
	379
	380	foo/bar$
	381	<sc1>foo<sc2>bar
	382
	383	.fi
	384	Note that the first of these, can be written "foo/bar\\n".
	385	.IP
	386	The following will result in '$' or '^' being treated as a normal character:
	387	.nf
	388
	389	foo\|(bar$)
	390	foo\|^bar
	391
	392	.fi
	393	If what's wanted is a "foo" or a bar-followed-by-a-newline, the following
	394	could be used (the special '\|' action is explained below):
	395	.nf
	396
	397	foo \|
	398	bar$ /* action goes here */
	399
	400	.fi
	401	A similar trick will work for matching a foo or a
	402	bar-at-the-beginning-of-a-line.
	403	.SH HOW THE INPUT IS MATCHED
	404	When the generated scanner is run, it analyzes its input looking
	405	for strings which match any of its patterns. If it finds more than
	406	one match, it takes the one matching the most text (for trailing
	407	context rules, this includes the length of the trailing part, even
	408	though it will then be returned to the input). If it finds two
	409	or more matches of the same length, the
	410	rule listed first in the
	411	.I flex
	412	input file is chosen.
	413	.LP
	414	Once the match is determined, the text corresponding to the match
	415	(called the
	416	.I token)
	417	is made available in the global character pointer
	418	.B yytext,
	419	and its length in the global integer
	420	.B yyleng.
	421	The
	422	.I action
	423	corresponding to the matched pattern is then executed (a more
	424	detailed description of actions follows), and then the remaining
	425	input is scanned for another match.
	426	.LP
	427	If no match is found, then the
	428	.I default rule
	429	is executed: the next character in the input is considered matched and
	430	copied to the standard output. Thus, the simplest legal
	431	.I flex
	432	input is:
	433	.nf
	434
	435	%%
	436
	437	.fi
	438	which generates a scanner that simply copies its input (one character
	439	at a time) to its output.
	440	.SH ACTIONS
	441	Each pattern in a rule has a corresponding action, which can be any
	442	arbitrary C statement. The pattern ends at the first non-escaped
	443	whitespace character; the remainder of the line is its action. If the
	444	action is empty, then when the pattern is matched the input token
	445	is simply discarded. For example, here is the specification for a program
	446	which deletes all occurrences of "zap me" from its input:
	447	.nf
	448
	449	%%
	450	"zap me"
	451
	452	.fi
	453	(It will copy all other characters in the input to the output since
	454	they will be matched by the default rule.)
	455	.LP
	456	Here is a program which compresses multiple blanks and tabs down to
	457	a single blank, and throws away whitespace found at the end of a line:
	458	.nf
	459
	460	%%
	461	[ \\t]+ putchar( ' ' );
	462	[ \\t]+$ /* ignore this token */
	463
	464	.fi
	465	.LP
	466	If the action contains a '{', then the action spans till the balancing '}'
	467	is found, and the action may cross multiple lines.
	468	.I flex
	469	knows about C strings and comments and won't be fooled by braces found
	470	within them, but also allows actions to begin with
	471	.B %{
	472	and will consider the action to be all the text up to the next
	473	.B %}
	474	(regardless of ordinary braces inside the action).
	475	.LP
	476	An action consisting solely of a vertical bar ('\|') means "same as
	477	the action for the next rule." See below for an illustration.
	478	.LP
	479	Actions can include arbitrary C code, including
	480	.B return
	481	statements to return a value to whatever routine called
	482	.B yylex().
	483	Each time
	484	.B yylex()
	485	is called it continues processing tokens from where it last left
	486	off until it either reaches
	487	the end of the file or executes a return. Once it reaches an end-of-file,
	488	however, then any subsequent call to
	489	.B yylex()
	490	will simply immediately return, unless
	491	.B yyrestart()
	492	is first called (see below).
	493	.LP
	494	Actions are not allowed to modify yytext or yyleng.
	495	.LP
	496	There are a number of special directives which can be included within
	497	an action:
	498	.IP -
	499	.B ECHO
	500	copies yytext to the scanner's output.
	501	.IP -
	502	.B BEGIN
	503	followed by the name of a start condition places the scanner in the
	504	corresponding start condition (see below).
	505	.IP -
	506	.B REJECT
	507	directs the scanner to proceed on to the "second best" rule which matched the
	508	input (or a prefix of the input). The rule is chosen as described
	509	above in "How the Input is Matched", and
	510	.B yytext
	511	and
	512	.B yyleng
	513	set up appropriately.
	514	It may either be one which matched as much text
	515	as the originally chosen rule but came later in the
	516	.I flex
	517	input file, or one which matched less text.
	518	For example, the following will both count the
	519	words in the input and call the routine special() whenever "frob" is seen:
	520	.nf
	521
	522	int word_count = 0;
	523	%%
	524
	525	frob special(); REJECT;
	526	[^ \\t\\n]+ ++word_count;
	527
	528	.fi
	529	Without the
	530	.B REJECT,
	531	any "frob"'s in the input would not be counted as words, since the
	532	scanner normally executes only one action per token.
	533	Multiple
	534	.B REJECT's
	535	are allowed, each one finding the next best choice to the currently
	536	active rule. For example, when the following scanner scans the token
	537	"abcd", it will write "abcdabcaba" to the output:
	538	.nf
	539
	540	%%
	541	a \|
	542	ab \|
	543	abc \|
	544	abcd ECHO; REJECT;
	545	.\|\\n /* eat up any unmatched character */
	546
	547	.fi
	548	(The first three rules share the fourth's action since they use
	549	the special '\|' action.)
	550	.B REJECT
	551	is a particularly expensive feature in terms scanner performance;
	552	if it is used in
	553	.I any
	554	of the scanner's actions it will slow down
	555	.I all
	556	of the scanner's matching. Furthermore,
	557	.B REJECT
	558	cannot be used with the
	559	.I -f
	560	or
	561	.I -F
	562	options (see below).
	563	.IP
	564	Note also that unlike the other special actions,
	565	.B REJECT
	566	is a
	567	.I branch;
	568	code immediately following it in the action will
	569	.I not
	570	be executed.
	571	.IP -
	572	.B yymore()
	573	tells the scanner that the next time it matches a rule, the corresponding
	574	token should be
	575	.I appended
	576	onto the current value of
	577	.B yytext
	578	rather than replacing it. For example, given the input "mega-kludge"
	579	the following will write "mega-mega-kludge" to the output:
	580	.nf
	581
	582	%%
	583	mega- ECHO; yymore();
	584	kludge ECHO;
	585
	586	.fi
	587	First "mega-" is matched and echoed to the output. Then "kludge"
	588	is matched, but the previous "mega-" is still hanging around at the
	589	beginning of
	590	.B yytext
	591	so the
	592	.B ECHO
	593	for the "kludge" rule will actually write "mega-kludge".
	594	The presence of
	595	.B yymore()
	596	in the scanner's action entails a minor performance penalty in the
	597	scanner's matching speed.
	598	.IP -
	599	.B yyless(n)
	600	returns all but the first
	601	.I n
	602	characters of the current token back to the input stream, where they
	603	will be rescanned when the scanner looks for the next match.
	604	.B yytext
	605	and
	606	.B yyleng
	607	are adjusted appropriately (e.g.,
	608	.B yyleng
	609	will now be equal to
	610	.I n
	611	). For example, on the input "foobar" the following will write out
	612	"foobarbar":
	613	.nf
	614
	615	%%
	616	foobar ECHO; yyless(3);
	617	[a-z]+ ECHO;
	618
	619	.fi
	620	An argument of 0 to
	621	.B yyless
	622	will cause the entire current input string to be scanned again. Unless you've
	623	changed how the scanner will subsequently process its input (using
	624	.B BEGIN,
	625	for example), this will result in an endless loop.
	626	.IP -
	627	.B unput(c)
	628	puts the character
	629	.I c
	630	back onto the input stream. It will be the next character scanned.
	631	The following action will take the current token and cause it
	632	to be rescanned enclosed in parentheses.
	633	.nf
	634
	635	{
	636	int i;
	637	unput( ')' );
	638	for ( i = yyleng - 1; i >= 0; --i )
	639	unput( yytext[i] );
	640	unput( '(' );
	641	}
	642
	643	.fi
	644	Note that since each
	645	.B unput()
	646	puts the given character back at the
	647	.I beginning
	648	of the input stream, pushing back strings must be done back-to-front.
	649	.IP -
	650	.B input()
	651	reads the next character from the input stream. For example,
	652	the following is one way to eat up C comments:
	653	.nf
	654
	655	%%
	656	"/*" {
	657	register int c;
	658
	659	for ( ; ; )
	660	{
	661	while ( (c = input()) != '*' &&
	662	c != EOF )
	663	; /* eat up text of comment */
	664
	665	if ( c == '*' )
	666	{
	667	while ( (c = input()) == '*' )
	668	;
	669	if ( c == '/' )
	670	break; /* found the end */
	671	}
	672
	673	if ( c == EOF )
	674	{
	675	error( "EOF in comment" );
	676	break;
	677	}
	678	}
	679	}
	680
	681	.fi
	682	(Note that if the scanner is compiled using
	683	.B C++,
	684	then
	685	.B input()
	686	is instead referred to as
	687	.B yyinput(),
	688	in order to avoid a name clash with the
	689	.B C++
	690	stream by the name of
	691	.I input.)
	692	.IP -
	693	.B yyterminate()
	694	can be used in lieu of a return statement in an action. It terminates
	695	the scanner and returns a 0 to the scanner's caller, indicating "all done".
	696	Subsequent calls to the scanner will immediately return unless preceded
	697	by a call to
	698	.B yyrestart()
	699	(see below).
	700	By default,
	701	.B yyterminate()
	702	is also called when an end-of-file is encountered. It is a macro and
	703	may be redefined.
	704	.SH THE GENERATED SCANNER
	705	The output of
	706	.I flex
	707	is the file
	708	.B lex.yy.c,
	709	which contains the scanning routine
	710	.B yylex(),
	711	a number of tables used by it for matching tokens, and a number
	712	of auxiliary routines and macros. By default,
	713	.B yylex()
	714	is declared as follows:
	715	.nf
	716
	717	int yylex()
	718	{
	719	... various definitions and the actions in here ...
	720	}
	721
	722	.fi
	723	(If your environment supports function prototypes, then it will
	724	be "int yylex( void )".) This definition may be changed by redefining
	725	the "YY_DECL" macro. For example, you could use:
	726	.nf
	727
	728	#undef YY_DECL
	729	#define YY_DECL float lexscan( a, b ) float a, b;
	730
	731	.fi
	732	to give the scanning routine the name
	733	.I lexscan,
	734	returning a float, and taking two floats as arguments. Note that
	735	if you give arguments to the scanning routine using a
	736	K&R-style/non-prototyped function declaration, you must terminate
	737	the definition with a semi-colon (;).
	738	.LP
	739	Whenever
	740	.B yylex()
	741	is called, it scans tokens from the global input file
	742	.I yyin
	743	(which defaults to stdin). It continues until it either reaches
	744	an end-of-file (at which point it returns the value 0) or
	745	one of its actions executes a
	746	.I return
	747	statement.
	748	In the former case, when called again the scanner will immediately
	749	return unless
	750	.B yyrestart()
	751	is called to point
	752	.I yyin
	753	at the new input file. (
	754	.B yyrestart()
	755	takes one argument, a
	756	.B FILE *
	757	pointer.)
	758	In the latter case (i.e., when an action
	759	executes a return), the scanner may then be called again and it
	760	will resume scanning where it left off.
	761	.LP
	762	By default (and for purposes of efficiency), the scanner uses
	763	block-reads rather than simple
	764	.I getc()
	765	calls to read characters from
	766	.I yyin.
	767	The nature of how it gets its input can be controlled by redefining the
	768	.B YY_INPUT
	769	macro.
	770	YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its
	771	action is to place up to
	772	.I max_size
	773	characters in the character array
	774	.I buf
	775	and return in the integer variable
	776	.I result
	777	either the
	778	number of characters read or the constant YY_NULL (0 on Unix systems)
	779	to indicate EOF. The default YY_INPUT reads from the
	780	global file-pointer "yyin".
	781	.LP
	782	A sample redefinition of YY_INPUT (in the definitions
	783	section of the input file):
	784	.nf
	785
	786	%{
	787	#undef YY_INPUT
	788	#define YY_INPUT(buf,result,max_size) \\
	789	{ \\
	790	int c = getchar(); \\
	791	result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\
	792	}
	793	%}
	794
	795	.fi
	796	This definition will change the input processing to occur
	797	one character at a time.
	798	.LP
	799	You also can add in things like keeping track of the
	800	input line number this way; but don't expect your scanner to
	801	go very fast.
	802	.LP
	803	When the scanner receives an end-of-file indication from YY_INPUT,
	804	it then checks the
	805	.B yywrap()
	806	function. If
	807	.B yywrap()
	808	returns false (zero), then it is assumed that the
	809	function has gone ahead and set up
	810	.I yyin
	811	to point to another input file, and scanning continues. If it returns
	812	true (non-zero), then the scanner terminates, returning 0 to its
	813	caller.
	814	.LP
	815	The default
	816	.B yywrap()
	817	always returns 1. Presently, to redefine it you must first
	818	"#undef yywrap", as it is currently implemented as a macro. As indicated
	819	by the hedging in the previous sentence, it may be changed to
	820	a true function in the near future.
	821	.LP
	822	The scanner writes its
	823	.B ECHO
	824	output to the
	825	.I yyout
	826	global (default, stdout), which may be redefined by the user simply
	827	by assigning it to some other
	828	.B FILE
	829	pointer.
	830	.SH START CONDITIONS
	831	.I flex
	832	provides a mechanism for conditionally activating rules. Any rule
	833	whose pattern is prefixed with "<sc>" will only be active when
	834	the scanner is in the start condition named "sc". For example,
	835	.nf
	836
	837	<STRING>[^"]* { /* eat up the string body ... */
	838	...
	839	}
	840
	841	.fi
	842	will be active only when the scanner is in the "STRING" start
	843	condition, and
	844	.nf
	845
	846	<INITIAL,STRING,QUOTE>\\. { /* handle an escape ... */
	847	...
	848	}
	849
	850	.fi
	851	will be active only when the current start condition is
	852	either "INITIAL", "STRING", or "QUOTE".
	853	.LP
	854	Start conditions
	855	are declared in the definitions (first) section of the input
	856	using unindented lines beginning with either
	857	.B %s
	858	or
	859	.B %x
	860	followed by a list of names.
	861	The former declares
	862	.I inclusive
	863	start conditions, the latter
	864	.I exclusive
	865	start conditions. A start condition is activated using the
	866	.B BEGIN
	867	action. Until the next
	868	.B BEGIN
	869	action is executed, rules with the given start
	870	condition will be active and
	871	rules with other start conditions will be inactive.
	872	If the start condition is
	873	.I inclusive,
	874	then rules with no start conditions at all will also be active.
	875	If it is
	876	.I exclusive,
	877	then
	878	.I only
	879	rules qualified with the start condition will be active.
	880	A set of rules contingent on the same exclusive start condition
	881	describe a scanner which is independent of any of the other rules in the
	882	.I flex
	883	input. Because of this,
	884	exclusive start conditions make it easy to specify "mini-scanners"
	885	which scan portions of the input that are syntactically different
	886	from the rest (e.g., comments).
	887	.LP
	888	If the distinction between inclusive and exclusive start conditions
	889	is still a little vague, here's a simple example illustrating the
	890	connection between the two. The set of rules:
	891	.nf
	892
	893	%s example
	894	%%
	895	<example>foo /* do something */
	896
	897	.fi
	898	is equivalent to
	899	.nf
	900
	901	%x example
	902	%%
	903	<INITIAL,example>foo /* do something */
	904
	905	.fi
	906	.LP
	907	The default rule (to
	908	.B ECHO
	909	any unmatched character) remains active in start conditions.
	910	.LP
	911	.B BEGIN(0)
	912	returns to the original state where only the rules with
	913	no start conditions are active. This state can also be
	914	referred to as the start-condition "INITIAL", so
	915	.B BEGIN(INITIAL)
	916	is equivalent to
	917	.B BEGIN(0).
	918	(The parentheses around the start condition name are not required but
	919	are considered good style.)
	920	.LP
	921	.B BEGIN
	922	actions can also be given as indented code at the beginning
	923	of the rules section. For example, the following will cause
	924	the scanner to enter the "SPECIAL" start condition whenever
	925	.I yylex()
	926	is called and the global variable
	927	.I enter_special
	928	is true:
	929	.nf
	930
	931	int enter_special;
	932
	933	%x SPECIAL
	934	%%
	935	if ( enter_special )
	936	BEGIN(SPECIAL);
	937
	938	<SPECIAL>blahblahblah
	939	...more rules follow...
	940
	941	.fi
	942	.LP
	943	To illustrate the uses of start conditions,
	944	here is a scanner which provides two different interpretations
	945	of a string like "123.456". By default it will treat it as
	946	as three tokens, the integer "123", a dot ('.'), and the integer "456".
	947	But if the string is preceded earlier in the line by the string
	948	"expect-floats"
	949	it will treat it as a single token, the floating-point number
	950	123.456:
	951	.nf
	952
	953	%{
	954	#include <math.h>
	955	%}
	956	%s expect
	957
	958	%%
	959	expect-floats BEGIN(expect);
	960
	961	<expect>[0-9]+"."[0-9]+ {
	962	printf( "found a float, = %f\\n",
	963	atof( yytext ) );
	964	}
	965	<expect>\\n {
	966	/* that's the end of the line, so
	967	* we need another "expect-number"
	968	* before we'll recognize any more
	969	* numbers
	970	*/
	971	BEGIN(INITIAL);
	972	}
	973
	974	[0-9]+ {
	975	printf( "found an integer, = %d\\n",
	976	atoi( yytext ) );
	977	}
	978
	979	"." printf( "found a dot\\n" );
	980
	981	.fi
	982	Here is a scanner which recognizes (and discards) C comments while
	983	maintaining a count of the current input line.
	984	.nf
	985
	986	%x comment
	987	%%
	988	int line_num = 1;
	989
	990	"/*" BEGIN(comment);
	991
	992	<comment>[^\\n] /* eat anything that's not a '' /
	993	<comment>""+[^/\\n]* /* eat up ''s not followed by '/'s /
	994	<comment>\\n ++line_num;
	995	<comment>"*"+"/" BEGIN(INITIAL);
	996
	997	.fi
	998	Note that start-conditions names are really integer values and
	999	can be stored as such. Thus, the above could be extended in the
	1000	following fashion:
	1001	.nf
	1002
	1003	%x comment foo
	1004	%%
	1005	int line_num = 1;
	1006	int comment_caller;
	1007
	1008	"/*" {
	1009	comment_caller = INITIAL;
	1010	BEGIN(comment);
	1011	}
	1012
	1013	...
	1014
	1015	<foo>"/*" {
	1016	comment_caller = foo;
	1017	BEGIN(comment);
	1018	}
	1019
	1020	<comment>[^\\n] /* eat anything that's not a '' /
	1021	<comment>""+[^/\\n]* /* eat up ''s not followed by '/'s /
	1022	<comment>\\n ++line_num;
	1023	<comment>"*"+"/" BEGIN(comment_caller);
	1024
	1025	.fi
	1026	One can then implement a "stack" of start conditions using an
	1027	array of integers. (It is likely that such stacks will become
	1028	a full-fledged
	1029	.I flex
	1030	feature in the future.) Note, though, that
	1031	start conditions do not have their own name-space; %s's and %x's
	1032	declare names in the same fashion as #define's.
	1033	.SH MULTIPLE INPUT BUFFERS
	1034	Some scanners (such as those which support "include" files)
	1035	require reading from several input streams. As
	1036	.I flex
	1037	scanners do a large amount of buffering, one cannot control
	1038	where the next input will be read from by simply writing a
	1039	.B YY_INPUT
	1040	which is sensitive to the scanning context.
	1041	.B YY_INPUT
	1042	is only called when the scanner reaches the end of its buffer, which
	1043	may be a long time after scanning a statement such as an "include"
	1044	which requires switching the input source.
	1045	.LP
	1046	To negotiate these sorts of problems,
	1047	.I flex
	1048	provides a mechanism for creating and switching between multiple
	1049	input buffers. An input buffer is created by using:
	1050	.nf
	1051
	1052	YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
	1053
	1054	.fi
	1055	which takes a
	1056	.I FILE
	1057	pointer and a size and creates a buffer associated with the given
	1058	file and large enough to hold
	1059	.I size
	1060	characters (when in doubt, use
	1061	.B YY_BUF_SIZE
	1062	for the size). It returns a
	1063	.B YY_BUFFER_STATE
	1064	handle, which may then be passed to other routines:
	1065	.nf
	1066
	1067	void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
	1068
	1069	.fi
	1070	switches the scanner's input buffer so subsequent tokens will
	1071	come from
	1072	.I new_buffer.
	1073	Note that
	1074	.B yy_switch_to_buffer()
	1075	may be used by yywrap() to sets things up for continued scanning, instead
	1076	of opening a new file and pointing
	1077	.I yyin
	1078	at it.
	1079	.nf
	1080
	1081	void yy_delete_buffer( YY_BUFFER_STATE buffer )
	1082
	1083	.fi
	1084	is used to reclaim the storage associated with a buffer.
	1085	.LP
	1086	.B yy_new_buffer()
	1087	is an alias for
	1088	.B yy_create_buffer(),
	1089	provided for compatibility with the C++ use of
	1090	.I new
	1091	and
	1092	.I delete
	1093	for creating and destroying dynamic objects.
	1094	.LP
	1095	Finally, the
	1096	.B YY_CURRENT_BUFFER
	1097	macro returns a
	1098	.B YY_BUFFER_STATE
	1099	handle to the current buffer.
	1100	.LP
	1101	Here is an example of using these features for writing a scanner
	1102	which expands include files (the
	1103	.B <<EOF>>
	1104	feature is discussed below):
	1105	.nf
	1106
	1107	/* the "incl" state is used for picking up the name
	1108	* of an include file
	1109	*/
	1110	%x incl
	1111
	1112	%{
	1113	#define MAX_INCLUDE_DEPTH 10
	1114	YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
	1115	int include_stack_ptr = 0;
	1116	%}
	1117
	1118	%%
	1119	include BEGIN(incl);
	1120
	1121	[a-z]+ ECHO;
	1122	[^a-z\\n]*\\n? ECHO;
	1123
	1124	<incl>[ \\t]* /* eat the whitespace */
	1125	<incl>[^ \\t\\n]+ { /* got the include file name */
	1126	if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
	1127	{
	1128	fprintf( stderr, "Includes nested too deeply" );
	1129	exit( 1 );
	1130	}
	1131
	1132	include_stack[include_stack_ptr++] =
	1133	YY_CURRENT_BUFFER;
	1134
	1135	yyin = fopen( yytext, "r" );
	1136
	1137	if ( ! yyin )
	1138	error( ... );
	1139
	1140	yy_switch_to_buffer(
	1141	yy_create_buffer( yyin, YY_BUF_SIZE ) );
	1142
	1143	BEGIN(INITIAL);
	1144	}
	1145
	1146	<<EOF>> {
	1147	if ( --include_stack_ptr < 0 )
	1148	{
	1149	yyterminate();
	1150	}
	1151
	1152	else
	1153	yy_switch_to_buffer(
	1154	include_stack[include_stack_ptr] );
	1155	}
	1156
	1157	.fi
	1158	.SH END-OF-FILE RULES
	1159	The special rule "<<EOF>>" indicates
	1160	actions which are to be taken when an end-of-file is
	1161	encountered and yywrap() returns non-zero (i.e., indicates
	1162	no further files to process). The action must finish
	1163	by doing one of four things:
	1164	.IP -
	1165	the special
	1166	.B YY_NEW_FILE
	1167	action, if
	1168	.I yyin
	1169	has been pointed at a new file to process;
	1170	.IP -
	1171	a
	1172	.I return
	1173	statement;
	1174	.IP -
	1175	the special
	1176	.B yyterminate()
	1177	action;
	1178	.IP -
	1179	or, switching to a new buffer using
	1180	.B yy_switch_to_buffer()
	1181	as shown in the example above.
	1182	.LP
	1183	<<EOF>> rules may not be used with other
	1184	patterns; they may only be qualified with a list of start
	1185	conditions. If an unqualified <<EOF>> rule is given, it
	1186	applies to
	1187	.I all
	1188	start conditions which do not already have <<EOF>> actions. To
	1189	specify an <<EOF>> rule for only the initial start condition, use
	1190	.nf
	1191
	1192	<INITIAL><<EOF>>
	1193
	1194	.fi
	1195	.LP
	1196	These rules are useful for catching things like unclosed comments.
	1197	An example:
	1198	.nf
	1199
	1200	%x quote
	1201	%%
	1202
	1203	...other rules for dealing with quotes...
	1204
	1205	<quote><<EOF>> {
	1206	error( "unterminated quote" );
	1207	yyterminate();
	1208	}
	1209	<<EOF>> {
	1210	if ( *++filelist )
	1211	{
	1212	yyin = fopen( *filelist, "r" );
	1213	YY_NEW_FILE;
	1214	}
	1215	else
	1216	yyterminate();
	1217	}
	1218
	1219	.fi
	1220	.SH MISCELLANEOUS MACROS
	1221	The macro
	1222	.B YY_USER_ACTION
	1223	can be redefined to provide an action
	1224	which is always executed prior to the matched rule's action. For example,
	1225	it could be #define'd to call a routine to convert yytext to lower-case.
	1226	.LP
	1227	The macro
	1228	.B YY_USER_INIT
	1229	may be redefined to provide an action which is always executed before
	1230	the first scan (and before the scanner's internal initializations are done).
	1231	For example, it could be used to call a routine to read
	1232	in a data table or open a logging file.
	1233	.LP
	1234	In the generated scanner, the actions are all gathered in one large
	1235	switch statement and separated using
	1236	.B YY_BREAK,
	1237	which may be redefined. By default, it is simply a "break", to separate
	1238	each rule's action from the following rule's.
	1239	Redefining
	1240	.B YY_BREAK
	1241	allows, for example, C++ users to
	1242	#define YY_BREAK to do nothing (while being very careful that every
	1243	rule ends with a "break" or a "return"!) to avoid suffering from
	1244	unreachable statement warnings where because a rule's action ends with
	1245	"return", the
	1246	.B YY_BREAK
	1247	is inaccessible.
	1248	.SH INTERFACING WITH YACC
	1249	One of the main uses of
	1250	.I flex
	1251	is as a companion to the
	1252	.I yacc
	1253	parser-generator.
	1254	.I yacc
	1255	parsers expect to call a routine named
	1256	.B yylex()
	1257	to find the next input token. The routine is supposed to
	1258	return the type of the next token as well as putting any associated
	1259	value in the global
	1260	.B yylval.
	1261	To use
	1262	.I flex
	1263	with
	1264	.I yacc,
	1265	one specifies the
	1266	.B -d
	1267	option to
	1268	.I yacc
	1269	to instruct it to generate the file
	1270	.B y.tab.h
	1271	containing definitions of all the
	1272	.B %tokens
	1273	appearing in the
	1274	.I yacc
	1275	input. This file is then included in the
	1276	.I flex
	1277	scanner. For example, if one of the tokens is "TOK_NUMBER",
	1278	part of the scanner might look like:
	1279	.nf
	1280
	1281	%{
	1282	#include "y.tab.h"
	1283	%}
	1284
	1285	%%
	1286
	1287	[0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
	1288
	1289	.fi
	1290	.SH TRANSLATION TABLE
	1291	In the name of POSIX compliance,
	1292	.I flex
	1293	supports a
	1294	.I translation table
	1295	for mapping input characters into groups.
	1296	The table is specified in the first section, and its format looks like:
	1297	.nf
	1298
	1299	%t
	1300	1 abcd
	1301	2 ABCDEFGHIJKLMNOPQRSTUVWXYZ
	1302	52 0123456789
	1303	6 \\t\\ \\n
	1304	%t
	1305
	1306	.fi
	1307	This example specifies that the characters 'a', 'b', 'c', and 'd'
	1308	are to all be lumped into group #1, upper-case letters
	1309	in group #2, digits in group #52, tabs, blanks, and newlines into
	1310	group #6, and
	1311	.I
	1312	no other characters will appear in the patterns.
	1313	The group numbers are actually disregarded by
	1314	.I flex;
	1315	.B %t
	1316	serves, though, to lump characters together. Given the above
	1317	table, for example, the pattern "a(AA)5" is equivalent to "d(ZQ)0".
	1318	They both say, "match any character in group #1, followed by
	1319	zero-or-more pairs of characters
	1320	from group #2, followed by a character from group #52." Thus
	1321	.B %t
	1322	provides a crude way for introducing equivalence classes into
	1323	the scanner specification.
	1324	.LP
	1325	Note that the
	1326	.B -i
	1327	option (see below) coupled with the equivalence classes which
	1328	.I flex
	1329	automatically generates take care of virtually all the instances
	1330	when one might consider using
	1331	.B %t.
	1332	But what the hell, it's there if you want it.
	1333	.SH OPTIONS
	1334	.I flex
	1335	has the following options:
	1336	.TP
	1337	.B -b
	1338	Generate backtracking information to
	1339	.I lex.backtrack.
	1340	This is a list of scanner states which require backtracking
	1341	and the input characters on which they do so. By adding rules one
	1342	can remove backtracking states. If all backtracking states
	1343	are eliminated and
	1344	.B -f
	1345	or
	1346	.B -F
	1347	is used, the generated scanner will run faster (see the
	1348	.B -p
	1349	flag). Only users who wish to squeeze every last cycle out of their
	1350	scanners need worry about this option. (See the section on PERFORMANCE
	1351	CONSIDERATIONS below.)
	1352	.TP
	1353	.B -c
	1354	is a do-nothing, deprecated option included for POSIX compliance.
	1355	.IP
	1356	.B NOTE:
	1357	in previous releases of
	1358	.I flex
	1359	.B -c
	1360	specified table-compression options. This functionality is
	1361	now given by the
	1362	.B -C
	1363	flag. To ease the the impact of this change, when
	1364	.I flex
	1365	encounters
	1366	.B -c,
	1367	it currently issues a warning message and assumes that
	1368	.B -C
	1369	was desired instead. In the future this "promotion" of
	1370	.B -c
	1371	to
	1372	.B -C
	1373	will go away in the name of full POSIX compliance (unless
	1374	the POSIX meaning is removed first).
	1375	.TP
	1376	.B -d
	1377	makes the generated scanner run in
	1378	.I debug
	1379	mode. Whenever a pattern is recognized and the global
	1380	.B yy_flex_debug
	1381	is non-zero (which is the default),
	1382	the scanner will write to
	1383	.I stderr
	1384	a line of the form:
	1385	.nf
	1386
	1387	--accepting rule at line 53 ("the matched text")
	1388
	1389	.fi
	1390	The line number refers to the location of the rule in the file
	1391	defining the scanner (i.e., the file that was fed to flex). Messages
	1392	are also generated when the scanner backtracks, accepts the
	1393	default rule, reaches the end of its input buffer (or encounters
	1394	a NUL; at this point, the two look the same as far as the scanner's concerned),
	1395	or reaches an end-of-file.
	1396	.TP
	1397	.B -f
	1398	specifies (take your pick)
	1399	.I full table
	1400	or
	1401	.I fast scanner.
	1402	No table compression is done. The result is large but fast.
	1403	This option is equivalent to
	1404	.B -Cf
	1405	(see below).
	1406	.TP
	1407	.B -i
	1408	instructs
	1409	.I flex
	1410	to generate a
	1411	.I case-insensitive
	1412	scanner. The case of letters given in the
	1413	.I flex
	1414	input patterns will
	1415	be ignored, and tokens in the input will be matched regardless of case. The
	1416	matched text given in
	1417	.I yytext
	1418	will have the preserved case (i.e., it will not be folded).
	1419	.TP
	1420	.B -n
	1421	is another do-nothing, deprecated option included only for
	1422	POSIX compliance.
	1423	.TP
	1424	.B -p
	1425	generates a performance report to stderr. The report
	1426	consists of comments regarding features of the
	1427	.I flex
	1428	input file which will cause a loss of performance in the resulting scanner.
	1429	Note that the use of
	1430	.I REJECT
	1431	and variable trailing context (see the BUGS section in flex(1))
	1432	entails a substantial performance penalty; use of
	1433	.I yymore(),
	1434	the
	1435	.B ^
	1436	operator,
	1437	and the
	1438	.B -I
	1439	flag entail minor performance penalties.
	1440	.TP
	1441	.B -s
	1442	causes the
	1443	.I default rule
	1444	(that unmatched scanner input is echoed to
	1445	.I stdout)
	1446	to be suppressed. If the scanner encounters input that does not
	1447	match any of its rules, it aborts with an error. This option is
	1448	useful for finding holes in a scanner's rule set.
	1449	.TP
	1450	.B -t
	1451	instructs
	1452	.I flex
	1453	to write the scanner it generates to standard output instead
	1454	of
	1455	.B lex.yy.c.
	1456	.TP
	1457	.B -v
	1458	specifies that
	1459	.I flex
	1460	should write to
	1461	.I stderr
	1462	a summary of statistics regarding the scanner it generates.
	1463	Most of the statistics are meaningless to the casual
	1464	.I flex
	1465	user, but the
	1466	first line identifies the version of
	1467	.I flex,
	1468	which is useful for figuring
	1469	out where you stand with respect to patches and new releases,
	1470	and the next two lines give the date when the scanner was created
	1471	and a summary of the flags which were in effect.
	1472	.TP
	1473	.B -F
	1474	specifies that the
	1475	.I fast
	1476	scanner table representation should be used. This representation is
	1477	about as fast as the full table representation
	1478	.RB ( \-f ),
	1479	and for some sets of patterns will be considerably smaller (and for
	1480	others, larger). In general, if the pattern set contains both "keywords"
	1481	and a catch-all, "identifier" rule, such as in the set:
	1482	.nf
	1483
	1484	"case" return TOK_CASE;
	1485	"switch" return TOK_SWITCH;
	1486	...
	1487	"default" return TOK_DEFAULT;
	1488	[a-z]+ return TOK_ID;
	1489
	1490	.fi
	1491	then you're better off using the full table representation. If only
	1492	the "identifier" rule is present and you then use a hash table or some such
	1493	to detect the keywords, you're better off using
	1494	.BR \-F .
	1495	.IP
	1496	This option is equivalent to
	1497	.B -CF
	1498	(see below).
	1499	.TP
	1500	.B -I
	1501	instructs
	1502	.I flex
	1503	to generate an
	1504	.I interactive
	1505	scanner. Normally, scanners generated by
	1506	.I flex
	1507	always look ahead one
	1508	character before deciding that a rule has been matched. At the cost of
	1509	some scanning overhead,
	1510	.I flex
	1511	will generate a scanner which only looks ahead
	1512	when needed. Such scanners are called
	1513	.I interactive
	1514	because if you want to write a scanner for an interactive system such as a
	1515	command shell, you will probably want the user's input to be terminated
	1516	with a newline, and without
	1517	.B -I
	1518	the user will have to type a character in addition to the newline in order
	1519	to have the newline recognized. This leads to dreadful interactive
	1520	performance.
	1521	.IP
	1522	If all this seems to confusing, here's the general rule: if a human will
	1523	be typing in input to your scanner, use
	1524	.B -I,
	1525	otherwise don't; if you don't care about squeezing the utmost performance
	1526	from your scanner and you
	1527	don't want to make any assumptions about the input to your scanner,
	1528	use
	1529	.B -I.
	1530	.IP
	1531	Note,
	1532	.B -I
	1533	cannot be used in conjunction with
	1534	.I full
	1535	or
	1536	.I fast tables,
	1537	i.e., the
	1538	.B -f, -F, -Cf,
	1539	or
	1540	.B -CF
	1541	flags.
	1542	.TP
	1543	.B -L
	1544	instructs
	1545	.I flex
	1546	not to generate
	1547	.B #line
	1548	directives. Without this option,
	1549	.I flex
	1550	peppers the generated scanner
	1551	with #line directives so error messages in the actions will be correctly
	1552	located with respect to the original
	1553	.I flex
	1554	input file, and not to
	1555	the fairly meaningless line numbers of
	1556	.B lex.yy.c.
	1557	(Unfortunately
	1558	.I flex
	1559	does not presently generate the necessary directives
	1560	to "retarget" the line numbers for those parts of
	1561	.B lex.yy.c
	1562	which it generated. So if there is an error in the generated code,
	1563	a meaningless line number is reported.)
	1564	.TP
	1565	.B -T
	1566	makes
	1567	.I flex
	1568	run in
	1569	.I trace
	1570	mode. It will generate a lot of messages to
	1571	.I stdout
	1572	concerning
	1573	the form of the input and the resultant non-deterministic and deterministic
	1574	finite automata. This option is mostly for use in maintaining
	1575	.I flex.
	1576	.TP
	1577	.B -8
	1578	instructs
	1579	.I flex
	1580	to generate an 8-bit scanner, i.e., one which can recognize 8-bit
	1581	characters. On some sites,
	1582	.I flex
	1583	is installed with this option as the default. On others, the default
	1584	is 7-bit characters. To see which is the case, check the verbose
	1585	.B (-v)
	1586	output for "equivalence classes created". If the denominator of
	1587	the number shown is 128, then by default
	1588	.I flex
	1589	is generating 7-bit characters. If it is 256, then the default is
	1590	8-bit characters and the
	1591	.B -8
	1592	flag is not required (but may be a good idea to keep the scanner
	1593	specification portable). Feeding a 7-bit scanner 8-bit characters
	1594	will result in infinite loops, bus errors, or other such fireworks,
	1595	so when in doubt, use the flag. Note that if equivalence classes
	1596	are used, 8-bit scanners take only slightly more table space than
	1597	7-bit scanners (128 bytes, to be exact); if equivalence classes are
	1598	not used, however, then the tables may grow up to twice their
	1599	7-bit size.
	1600	.TP
	1601	.B -C[efmF]
	1602	controls the degree of table compression.
	1603	.IP
	1604	.B -Ce
	1605	directs
	1606	.I flex
	1607	to construct
	1608	.I equivalence classes,
	1609	i.e., sets of characters
	1610	which have identical lexical properties (for example, if the only
	1611	appearance of digits in the
	1612	.I flex
	1613	input is in the character class
	1614	"[0-9]" then the digits '0', '1', ..., '9' will all be put
	1615	in the same equivalence class). Equivalence classes usually give
	1616	dramatic reductions in the final table/object file sizes (typically
	1617	a factor of 2-5) and are pretty cheap performance-wise (one array
	1618	look-up per character scanned).
	1619	.IP
	1620	.B -Cf
	1621	specifies that the
	1622	.I full
	1623	scanner tables should be generated -
	1624	.I flex
	1625	should not compress the
	1626	tables by taking advantages of similar transition functions for
	1627	different states.
	1628	.IP
	1629	.B -CF
	1630	specifies that the alternate fast scanner representation (described
	1631	above under the
	1632	.B -F
	1633	flag)
	1634	should be used.
	1635	.IP
	1636	.B -Cm
	1637	directs
	1638	.I flex
	1639	to construct
	1640	.I meta-equivalence classes,
	1641	which are sets of equivalence classes (or characters, if equivalence
	1642	classes are not being used) that are commonly used together. Meta-equivalence
	1643	classes are often a big win when using compressed tables, but they
	1644	have a moderate performance impact (one or two "if" tests and one
	1645	array look-up per character scanned).
	1646	.IP
	1647	A lone
	1648	.B -C
	1649	specifies that the scanner tables should be compressed but neither
	1650	equivalence classes nor meta-equivalence classes should be used.
	1651	.IP
	1652	The options
	1653	.B -Cf
	1654	or
	1655	.B -CF
	1656	and
	1657	.B -Cm
	1658	do not make sense together - there is no opportunity for meta-equivalence
	1659	classes if the table is not being compressed. Otherwise the options
	1660	may be freely mixed.
	1661	.IP
	1662	The default setting is
	1663	.B -Cem,
	1664	which specifies that
	1665	.I flex
	1666	should generate equivalence classes
	1667	and meta-equivalence classes. This setting provides the highest
	1668	degree of table compression. You can trade off
	1669	faster-executing scanners at the cost of larger tables with
	1670	the following generally being true:
	1671	.nf
	1672
	1673	slowest & smallest
	1674	-Cem
	1675	-Cm
	1676	-Ce
	1677	-C
	1678	-C{f,F}e
	1679	-C{f,F}
	1680	fastest & largest
	1681
	1682	.fi
	1683	Note that scanners with the smallest tables are usually generated and
	1684	compiled the quickest, so
	1685	during development you will usually want to use the default, maximal
	1686	compression.
	1687	.IP
	1688	.B -Cfe
	1689	is often a good compromise between speed and size for production
	1690	scanners.
	1691	.IP
	1692	.B -C
	1693	options are not cumulative; whenever the flag is encountered, the
	1694	previous -C settings are forgotten.
	1695	.TP
	1696	.B -Sskeleton_file
	1697	overrides the default skeleton file from which
	1698	.I flex
	1699	constructs its scanners. You'll never need this option unless you are doing
	1700	.I flex
	1701	maintenance or development.
	1702	.SH PERFORMANCE CONSIDERATIONS
	1703	The main design goal of
	1704	.I flex
	1705	is that it generate high-performance scanners. It has been optimized
	1706	for dealing well with large sets of rules. Aside from the effects
	1707	of table compression on scanner speed outlined above,
	1708	there are a number of options/actions which degrade performance. These
	1709	are, from most expensive to least:
	1710	.nf
	1711
	1712	REJECT
	1713
	1714	pattern sets that require backtracking
	1715	arbitrary trailing context
	1716
	1717	'^' beginning-of-line operator
	1718	yymore()
	1719
	1720	.fi
	1721	with the first three all being quite expensive and the last two
	1722	being quite cheap.
	1723	.LP
	1724	.B REJECT
	1725	should be avoided at all costs when performance is important.
	1726	It is a particularly expensive option.
	1727	.LP
	1728	Getting rid of backtracking is messy and often may be an enormous
	1729	amount of work for a complicated scanner. In principal, one begins
	1730	by using the
	1731	.B -b
	1732	flag to generate a
	1733	.I lex.backtrack
	1734	file. For example, on the input
	1735	.nf
	1736
	1737	%%
	1738	foo return TOK_KEYWORD;
	1739	foobar return TOK_KEYWORD;
	1740
	1741	.fi
	1742	the file looks like:
	1743	.nf
	1744
	1745	State #6 is non-accepting -
	1746	associated rule line numbers:
	1747	2 3
	1748	out-transitions: [ o ]
	1749	jam-transitions: EOF [ \\001-n p-\\177 ]
	1750
	1751	State #8 is non-accepting -
	1752	associated rule line numbers:
	1753	3
	1754	out-transitions: [ a ]
	1755	jam-transitions: EOF [ \\001-` b-\\177 ]
	1756
	1757	State #9 is non-accepting -
	1758	associated rule line numbers:
	1759	3
	1760	out-transitions: [ r ]
	1761	jam-transitions: EOF [ \\001-q s-\\177 ]
	1762
	1763	Compressed tables always backtrack.
	1764
	1765	.fi
	1766	The first few lines tell us that there's a scanner state in
	1767	which it can make a transition on an 'o' but not on any other
	1768	character, and that in that state the currently scanned text does not match
	1769	any rule. The state occurs when trying to match the rules found
	1770	at lines 2 and 3 in the input file.
	1771	If the scanner is in that state and then reads
	1772	something other than an 'o', it will have to backtrack to find
	1773	a rule which is matched. With
	1774	a bit of headscratching one can see that this must be the
	1775	state it's in when it has seen "fo". When this has happened,
	1776	if anything other than another 'o' is seen, the scanner will
	1777	have to back up to simply match the 'f' (by the default rule).
	1778	.LP
	1779	The comment regarding State #8 indicates there's a problem
	1780	when "foob" has been scanned. Indeed, on any character other
	1781	than a 'b', the scanner will have to back up to accept "foo".
	1782	Similarly, the comment for State #9 concerns when "fooba" has
	1783	been scanned.
	1784	.LP
	1785	The final comment reminds us that there's no point going to
	1786	all the trouble of removing backtracking from the rules unless
	1787	we're using
	1788	.B -f
	1789	or
	1790	.B -F,
	1791	since there's no performance gain doing so with compressed scanners.
	1792	.LP
	1793	The way to remove the backtracking is to add "error" rules:
	1794	.nf
	1795
	1796	%%
	1797	foo return TOK_KEYWORD;
	1798	foobar return TOK_KEYWORD;
	1799
	1800	fooba \|
	1801	foob \|
	1802	fo {
	1803	/* false alarm, not really a keyword */
	1804	return TOK_ID;
	1805	}
	1806
	1807	.fi
	1808	.LP
	1809	Eliminating backtracking among a list of keywords can also be
	1810	done using a "catch-all" rule:
	1811	.nf
	1812
	1813	%%
	1814	foo return TOK_KEYWORD;
	1815	foobar return TOK_KEYWORD;
	1816
	1817	[a-z]+ return TOK_ID;
	1818
	1819	.fi
	1820	This is usually the best solution when appropriate.
	1821	.LP
	1822	Backtracking messages tend to cascade.
	1823	With a complicated set of rules it's not uncommon to get hundreds
	1824	of messages. If one can decipher them, though, it often
	1825	only takes a dozen or so rules to eliminate the backtracking (though
	1826	it's easy to make a mistake and have an error rule accidentally match
	1827	a valid token. A possible future
	1828	.I flex
	1829	feature will be to automatically add rules to eliminate backtracking).
	1830	.LP
	1831	.I Variable
	1832	trailing context (where both the leading and trailing parts do not have
	1833	a fixed length) entails almost the same performance loss as
	1834	.I REJECT
	1835	(i.e., substantial). So when possible a rule like:
	1836	.nf
	1837
	1838	%%
	1839	mouse\|rat/(cat\|dog) run();
	1840
	1841	.fi
	1842	is better written:
	1843	.nf
	1844
	1845	%%
	1846	mouse/cat\|dog run();
	1847	rat/cat\|dog run();
	1848
	1849	.fi
	1850	or as
	1851	.nf
	1852
	1853	%%
	1854	mouse\|rat/cat run();
	1855	mouse\|rat/dog run();
	1856
	1857	.fi
	1858	Note that here the special '\|' action does
	1859	.I not
	1860	provide any savings, and can even make things worse (see
	1861	.B BUGS
	1862	in flex(1)).
	1863	.LP
	1864	Another area where the user can increase a scanner's performance
	1865	(and one that's easier to implement) arises from the fact that
	1866	the longer the tokens matched, the faster the scanner will run.
	1867	This is because with long tokens the processing of most input
	1868	characters takes place in the (short) inner scanning loop, and
	1869	does not often have to go through the additional work of setting up
	1870	the scanning environment (e.g.,
	1871	.B yytext)
	1872	for the action. Recall the scanner for C comments:
	1873	.nf
	1874
	1875	%x comment
	1876	%%
	1877	int line_num = 1;
	1878
	1879	"/*" BEGIN(comment);
	1880
	1881	<comment>[^\\n]
	1882	<comment>""+[^/\\n]*
	1883	<comment>\\n ++line_num;
	1884	<comment>"*"+"/" BEGIN(INITIAL);
	1885
	1886	.fi
	1887	This could be sped up by writing it as:
	1888	.nf
	1889
	1890	%x comment
	1891	%%
	1892	int line_num = 1;
	1893
	1894	"/*" BEGIN(comment);
	1895
	1896	<comment>[^\\n]
	1897	<comment>[^\\n]\\n ++line_num;
	1898	<comment>""+[^/\\n]*
	1899	<comment>""+[^/\\n]*\\n ++line_num;
	1900	<comment>"*"+"/" BEGIN(INITIAL);
	1901
	1902	.fi
	1903	Now instead of each newline requiring the processing of another
	1904	action, recognizing the newlines is "distributed" over the other rules
	1905	to keep the matched text as long as possible. Note that
	1906	.I adding
	1907	rules does
	1908	.I not
	1909	slow down the scanner! The speed of the scanner is independent
	1910	of the number of rules or (modulo the considerations given at the
	1911	beginning of this section) how complicated the rules are with
	1912	regard to operators such as '*' and '\|'.
	1913	.LP
	1914	A final example in speeding up a scanner: suppose you want to scan
	1915	through a file containing identifiers and keywords, one per line
	1916	and with no other extraneous characters, and recognize all the
	1917	keywords. A natural first approach is:
	1918	.nf
	1919
	1920	%%
	1921	asm \|
	1922	auto \|
	1923	break \|
	1924	... etc ...
	1925	volatile \|
	1926	while /* it's a keyword */
	1927
	1928	.\|\\n /* it's not a keyword */
	1929
	1930	.fi
	1931	To eliminate the back-tracking, introduce a catch-all rule:
	1932	.nf
	1933
	1934	%%
	1935	asm \|
	1936	auto \|
	1937	break \|
	1938	... etc ...
	1939	volatile \|
	1940	while /* it's a keyword */
	1941
	1942	[a-z]+ \|
	1943	.\|\\n /* it's not a keyword */
	1944
	1945	.fi
	1946	Now, if it's guaranteed that there's exactly one word per line,
	1947	then we can reduce the total number of matches by a half by
	1948	merging in the recognition of newlines with that of the other
	1949	tokens:
	1950	.nf
	1951
	1952	%%
	1953	asm\\n \|
	1954	auto\\n \|
	1955	break\\n \|
	1956	... etc ...
	1957	volatile\\n \|
	1958	while\\n /* it's a keyword */
	1959
	1960	[a-z]+\\n \|
	1961	.\|\\n /* it's not a keyword */
	1962
	1963	.fi
	1964	One has to be careful here, as we have now reintroduced backtracking
	1965	into the scanner. In particular, while
	1966	.I we
	1967	know that there will never be any characters in the input stream
	1968	other than letters or newlines,
	1969	.I flex
	1970	can't figure this out, and it will plan for possibly needing backtracking
	1971	when it has scanned a token like "auto" and then the next character
	1972	is something other than a newline or a letter. Previously it would
	1973	then just match the "auto" rule and be done, but now it has no "auto"
	1974	rule, only a "auto\\n" rule. To eliminate the possibility of backtracking,
	1975	we could either duplicate all rules but without final newlines, or,
	1976	since we never expect to encounter such an input and therefore don't
	1977	how it's classified, we can introduce one more catch-all rule, this
	1978	one which doesn't include a newline:
	1979	.nf
	1980
	1981	%%
	1982	asm\\n \|
	1983	auto\\n \|
	1984	break\\n \|
	1985	... etc ...
	1986	volatile\\n \|
	1987	while\\n /* it's a keyword */
	1988
	1989	[a-z]+\\n \|
	1990	[a-z]+ \|
	1991	.\|\\n /* it's not a keyword */
	1992
	1993	.fi
	1994	Compiled with
	1995	.B -Cf,
	1996	this is about as fast as one can get a
	1997	.I flex
	1998	scanner to go for this particular problem.
	1999	.LP
	2000	A final note:
	2001	.I flex
	2002	is slow when matching NUL's, particularly when a token contains
	2003	multiple NUL's.
	2004	It's best to write rules which match
	2005	.I short
	2006	amounts of text if it's anticipated that the text will often include NUL's.
	2007	.SH INCOMPATIBILITIES WITH LEX AND POSIX
	2008	.I flex
	2009	is a rewrite of the Unix
	2010	.I lex
	2011	tool (the two implementations do not share any code, though),
	2012	with some extensions and incompatibilities, both of which
	2013	are of concern to those who wish to write scanners acceptable
	2014	to either implementation. At present, the POSIX
	2015	.I lex
	2016	draft is
	2017	very close to the original
	2018	.I lex
	2019	implementation, so some of these
	2020	incompatibilities are also in conflict with the POSIX draft. But
	2021	the intent is that except as noted below,
	2022	.I flex
	2023	as it presently stands will
	2024	ultimately be POSIX conformant (i.e., that those areas of conflict with
	2025	the POSIX draft will be resolved in
	2026	.I flex's
	2027	favor). Please bear in
	2028	mind that all the comments which follow are with regard to the POSIX
	2029	.I draft
	2030	standard of Summer 1989, and not the final document (or subsequent
	2031	drafts); they are included so
	2032	.I flex
	2033	users can be aware of the standardization issues and those areas where
	2034	.I flex
	2035	may in the near future undergo changes incompatible with
	2036	its current definition.
	2037	.LP
	2038	.I flex
	2039	is fully compatible with
	2040	.I lex
	2041	with the following exceptions:
	2042	.IP -
	2043	The undocumented
	2044	.I lex
	2045	scanner internal variable
	2046	.B yylineno
	2047	is not supported. It is difficult to support this option efficiently,
	2048	since it requires examining every character scanned and reexamining
	2049	the characters when the scanner backs up.
	2050	Things get more complicated when the end of buffer or file is reached or a
	2051	NUL is scanned (since the scan must then be restarted with the proper line
	2052	number count), or the user uses the yyless(), unput(), or REJECT actions,
	2053	or the multiple input buffer functions.
	2054	.IP
	2055	The fix is to add rules which, upon seeing a newline, increment
	2056	yylineno. This is usually an easy process, though it can be a drag if some
	2057	of the patterns can match multiple newlines along with other characters.
	2058	.IP
	2059	yylineno is not part of the POSIX draft.
	2060	.IP -
	2061	The
	2062	.B input()
	2063	routine is not redefinable, though it may be called to read characters
	2064	following whatever has been matched by a rule. If
	2065	.B input()
	2066	encounters an end-of-file the normal
	2067	.B yywrap()
	2068	processing is done. A ``real'' end-of-file is returned by
	2069	.B input()
	2070	as
	2071	.I EOF.
	2072	.IP
	2073	Input is instead controlled by redefining the
	2074	.B YY_INPUT
	2075	macro.
	2076	.IP
	2077	The
	2078	.I flex
	2079	restriction that
	2080	.B input()
	2081	cannot be redefined is in accordance with the POSIX draft, but
	2082	.B YY_INPUT
	2083	has not yet been accepted into the draft (and probably won't; it looks
	2084	like the draft will simply not specify any way of controlling the
	2085	scanner's input other than by making an initial assignment to
	2086	.I yyin).
	2087	.IP -
	2088	.I flex
	2089	scanners do not use stdio for input. Because of this, when writing an
	2090	interactive scanner one must explicitly call fflush() on the
	2091	stream associated with the terminal after writing out a prompt.
	2092	With
	2093	.I lex
	2094	such writes are automatically flushed since
	2095	.I lex
	2096	scanners use
	2097	.B getchar()
	2098	for their input. Also, when writing interactive scanners with
	2099	.I flex,
	2100	the
	2101	.B -I
	2102	flag must be used.
	2103	.IP -
	2104	.I flex
	2105	scanners are not as reentrant as
	2106	.I lex
	2107	scanners. In particular, if you have an interactive scanner and
	2108	an interrupt handler which long-jumps out of the scanner, and
	2109	the scanner is subsequently called again, you may get the following
	2110	message:
	2111	.nf
	2112
	2113	fatal flex scanner internal error--end of buffer missed
	2114
	2115	.fi
	2116	To reenter the scanner, first use
	2117	.nf
	2118
	2119	yyrestart( yyin );
	2120
	2121	.fi
	2122	.IP -
	2123	.B output()
	2124	is not supported.
	2125	Output from the
	2126	.B ECHO
	2127	macro is done to the file-pointer
	2128	.I yyout
	2129	(default
	2130	.I stdout).
	2131	.IP
	2132	The POSIX draft mentions that an
	2133	.B output()
	2134	routine exists but currently gives no details as to what it does.
	2135	.IP -
	2136	.I lex
	2137	does not support exclusive start conditions (%x), though they
	2138	are in the current POSIX draft.
	2139	.IP -
	2140	When definitions are expanded,
	2141	.I flex
	2142	encloses them in parentheses.
	2143	With lex, the following:
	2144	.nf
	2145
	2146	NAME [A-Z][A-Z0-9]*
	2147	%%
	2148	foo{NAME}? printf( "Found it\\n" );
	2149	%%
	2150
	2151	.fi
	2152	will not match the string "foo" because when the macro
	2153	is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
	2154	and the precedence is such that the '?' is associated with
	2155	"[A-Z0-9]*". With
	2156	.I flex,
	2157	the rule will be expanded to
	2158	"foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
	2159	Note that because of this, the
	2160	.B ^, $, <s>, /,
	2161	and
	2162	.B <<EOF>>
	2163	operators cannot be used in a
	2164	.I flex
	2165	definition.
	2166	.IP
	2167	The POSIX draft interpretation is the same as
	2168	.I flex's.
	2169	.IP -
	2170	To specify a character class which matches anything but a left bracket (']'),
	2171	in
	2172	.I lex
	2173	one can use "[^]]" but with
	2174	.I flex
	2175	one must use "[^\\]]". The latter works with
	2176	.I lex,
	2177	too.
	2178	.IP -
	2179	The
	2180	.I lex
	2181	.B %r
	2182	(generate a Ratfor scanner) option is not supported. It is not part
	2183	of the POSIX draft.
	2184	.IP -
	2185	If you are providing your own yywrap() routine, you must include a
	2186	"#undef yywrap" in the definitions section (section 1). Note that
	2187	the "#undef" will have to be enclosed in %{}'s.
	2188	.IP
	2189	The POSIX draft
	2190	specifies that yywrap() is a function and this is very unlikely to change; so
	2191	.I flex users are warned
	2192	that
	2193	.B yywrap()
	2194	is likely to be changed to a function in the near future.
	2195	.IP -
	2196	After a call to
	2197	.B unput(),
	2198	.I yytext
	2199	and
	2200	.I yyleng
	2201	are undefined until the next token is matched. This is not the case with
	2202	.I lex
	2203	or the present POSIX draft.
	2204	.IP -
	2205	The precedence of the
	2206	.B {}
	2207	(numeric range) operator is different.
	2208	.I lex
	2209	interprets "abc{1,3}" as "match one, two, or
	2210	three occurrences of 'abc'", whereas
	2211	.I flex
	2212	interprets it as "match 'ab'
	2213	followed by one, two, or three occurrences of 'c'". The latter is
	2214	in agreement with the current POSIX draft.
	2215	.IP -
	2216	The precedence of the
	2217	.B ^
	2218	operator is different.
	2219	.I lex
	2220	interprets "^foo\|bar" as "match either 'foo' at the beginning of a line,
	2221	or 'bar' anywhere", whereas
	2222	.I flex
	2223	interprets it as "match either 'foo' or 'bar' if they come at the beginning
	2224	of a line". The latter is in agreement with the current POSIX draft.
	2225	.IP -
	2226	To refer to yytext outside of the scanner source file,
	2227	the correct definition with
	2228	.I flex
	2229	is "extern char *yytext" rather than "extern char yytext[]".
	2230	This is contrary to the current POSIX draft but a point on which
	2231	.I flex
	2232	will not be changing, as the array representation entails a
	2233	serious performance penalty. It is hoped that the POSIX draft will
	2234	be emended to support the
	2235	.I flex
	2236	variety of declaration (as this is a fairly painless change to
	2237	require of
	2238	.I lex
	2239	users).
	2240	.IP -
	2241	.I yyin
	2242	is
	2243	.I initialized
	2244	by
	2245	.I lex
	2246	to be
	2247	.I stdin;
	2248	.I flex,
	2249	on the other hand,
	2250	initializes
	2251	.I yyin
	2252	to NULL
	2253	and then
	2254	.I assigns
	2255	it to
	2256	.I stdin
	2257	the first time the scanner is called, providing
	2258	.I yyin
	2259	has not already been assigned to a non-NULL value. The difference is
	2260	subtle, but the net effect is that with
	2261	.I flex
	2262	scanners,
	2263	.I yyin
	2264	does not have a valid value until the scanner has been called.
	2265	.IP -
	2266	The special table-size declarations such as
	2267	.B %a
	2268	supported by
	2269	.I lex
	2270	are not required by
	2271	.I flex
	2272	scanners;
	2273	.I flex
	2274	ignores them.
	2275	.IP -
	2276	The name
	2277	.B FLEX_SCANNER
	2278	is #define'd so scanners may be written for use with either
	2279	.I flex
	2280	or
	2281	.I lex.
	2282	.LP
	2283	The following
	2284	.I flex
	2285	features are not included in
	2286	.I lex
	2287	or the POSIX draft standard:
	2288	.nf
	2289
	2290	yyterminate()
	2291	<<EOF>>
	2292	YY_DECL
	2293	#line directives
	2294	%{}'s around actions
	2295	yyrestart()
	2296	comments beginning with '#' (deprecated)
	2297	multiple actions on a line
	2298
	2299	.fi
	2300	This last feature refers to the fact that with
	2301	.I flex
	2302	you can put multiple actions on the same line, separated with
	2303	semi-colons, while with
	2304	.I lex,
	2305	the following
	2306	.nf
	2307
	2308	foo handle_foo(); ++num_foos_seen;
	2309
	2310	.fi
	2311	is (rather surprisingly) truncated to
	2312	.nf
	2313
	2314	foo handle_foo();
	2315
	2316	.fi
	2317	.I flex
	2318	does not truncate the action. Actions that are not enclosed in
	2319	braces are simply terminated at the end of the line.
	2320	.SH DIAGNOSTICS
	2321	.I reject_used_but_not_detected undefined
	2322	or
	2323	.I yymore_used_but_not_detected undefined -
	2324	These errors can occur at compile time. They indicate that the
	2325	scanner uses
	2326	.B REJECT
	2327	or
	2328	.B yymore()
	2329	but that
	2330	.I flex
	2331	failed to notice the fact, meaning that
	2332	.I flex
	2333	scanned the first two sections looking for occurrences of these actions
	2334	and failed to find any, but somehow you snuck some in (via a #include
	2335	file, for example). Make an explicit reference to the action in your
	2336	.I flex
	2337	input file. (Note that previously
	2338	.I flex
	2339	supported a
	2340	.B %used/%unused
	2341	mechanism for dealing with this problem; this feature is still supported
	2342	but now deprecated, and will go away soon unless the author hears from
	2343	people who can argue compellingly that they need it.)
	2344	.LP
	2345	.I flex scanner jammed -
	2346	a scanner compiled with
	2347	.B -s
	2348	has encountered an input string which wasn't matched by
	2349	any of its rules.
	2350	.LP
	2351	.I flex input buffer overflowed -
	2352	a scanner rule matched a string long enough to overflow the
	2353	scanner's internal input buffer (16K bytes by default - controlled by
	2354	.B YY_BUF_SIZE
	2355	in "flex.skel". Note that to redefine this macro, you must first
	2356	.B #undefine
	2357	it).
	2358	.LP
	2359	.I scanner requires -8 flag -
	2360	Your scanner specification includes recognizing 8-bit characters and
	2361	you did not specify the -8 flag (and your site has not installed flex
	2362	with -8 as the default).
	2363	.LP
	2364	.I
	2365	fatal flex scanner internal error--end of buffer missed -
	2366	This can occur in an scanner which is reentered after a long-jump
	2367	has jumped out (or over) the scanner's activation frame. Before
	2368	reentering the scanner, use:
	2369	.nf
	2370
	2371	yyrestart( yyin );
	2372
	2373	.fi
	2374	.LP
	2375	.I too many %t classes! -
	2376	You managed to put every single character into its own %t class.
	2377	.I flex
	2378	requires that at least one of the classes share characters.
	2379	.SH DEFICIENCIES / BUGS
	2380	See flex(1).
	2381	.SH "SEE ALSO"
	2382	.LP
	2383	flex(1), lex(1), yacc(1), sed(1), awk(9).
	2384	.LP
	2385	M. E. Lesk and E. Schmidt,
	2386	.I LEX - Lexical Analyzer Generator
	2387	.SH AUTHOR
	2388	Vern Paxson, with the help of many ideas and much inspiration from
	2389	Van Jacobson. Original version by Jef Poskanzer. The fast table
	2390	representation is a partial implementation of a design done by Van
	2391	Jacobson. The implementation was done by Kevin Gong and Vern Paxson.
	2392	.LP
	2393	Thanks to the many
	2394	.I flex
	2395	beta-testers, feedbackers, and contributors, especially Casey
	2396	Leedom, benson@odi.com, Keith Bostic,
	2397	Frederic Brehm, Nick Christopher, Jason Coughlin,
	2398	Scott David Daniels, Leo Eskin,
	2399	Chris Faylor, Eric Goldman, Eric
	2400	Hughes, Jeffrey R. Jones, Kevin B. Kenny, Ronald Lamprecht,
	2401	Greg Lee, Craig Leres, Mohamed el Lozy, Jim Meyering, Marc Nozell, Esmond Pitt,
	2402	Jef Poskanzer, Jim Roskind,
	2403	Dave Tallman, Frank Whaley, Ken Yap, and those whose names
	2404	have slipped my marginal mail-archiving skills but whose contributions
	2405	are appreciated all the same.
	2406	.LP
	2407	Thanks to Keith Bostic, John Gilmore, Craig Leres, Bob
	2408	Mulcahy, Rich Salz, and Richard Stallman for help with various distribution
	2409	headaches.
	2410	.LP
	2411	Thanks to Esmond Pitt and Earle Horton for 8-bit character support;
	2412	to Benson Margulies and Fred
	2413	Burke for C++ support; to Ove Ewerlid for the basics of support for
	2414	NUL's; and to Eric Hughes for the basics of support for multiple buffers.
	2415	.LP
	2416	Work is being done on extending
	2417	.I flex
	2418	to generate scanners in which the
	2419	state machine is directly represented in C code rather than tables.
	2420	These scanners may well be substantially faster than those generated
	2421	using -f or -F. If you are working in this area and are interested
	2422	in comparing notes and seeing whether redundant work can be avoided,
	2423	contact Ove Ewerlid (ewerlid@mizar.DoCS.UU.SE).
	2424	.LP
	2425	This work was primarily done when I was at the Real Time Systems Group
	2426	at the Lawrence Berkeley Laboratory in Berkeley, CA. Many thanks to all there
	2427	for the support I received.
	2428	.LP
	2429	Send comments to:
	2430	.nf
	2431
	2432	Vern Paxson
	2433	Computer Science Department
	2434	4126 Upson Hall
	2435	Cornell University
	2436	Ithaca, NY 14853-7501
	2437
	2438	vern@cs.cornell.edu
	2439	decvax!cornell!vern
	2440
	2441	.fi
	2442	.\" ref. to awk(9) man page corrected -- ASW 2005-01-15

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format