[9] | 1 | .\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
|
---|
| 2 | .\" Copyright (c) 1992, 1993, 1994
|
---|
| 3 | .\" The Regents of the University of California. All rights reserved.
|
---|
| 4 | .\"
|
---|
| 5 | .\" This code is derived from software contributed to Berkeley by
|
---|
| 6 | .\" Henry Spencer.
|
---|
| 7 | .\"
|
---|
| 8 | .\" Redistribution and use in source and binary forms, with or without
|
---|
| 9 | .\" modification, are permitted provided that the following conditions
|
---|
| 10 | .\" are met:
|
---|
| 11 | .\" 1. Redistributions of source code must retain the above copyright
|
---|
| 12 | .\" notice, this list of conditions and the following disclaimer.
|
---|
| 13 | .\" 2. Redistributions in binary form must reproduce the above copyright
|
---|
| 14 | .\" notice, this list of conditions and the following disclaimer in the
|
---|
| 15 | .\" documentation and/or other materials provided with the distribution.
|
---|
| 16 | .\" 3. All advertising materials mentioning features or use of this software
|
---|
| 17 | .\" must display the following acknowledgement:
|
---|
| 18 | .\" This product includes software developed by the University of
|
---|
| 19 | .\" California, Berkeley and its contributors.
|
---|
| 20 | .\" 4. Neither the name of the University nor the names of its contributors
|
---|
| 21 | .\" may be used to endorse or promote products derived from this software
|
---|
| 22 | .\" without specific prior written permission.
|
---|
| 23 | .\"
|
---|
| 24 | .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
---|
| 25 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
---|
| 26 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
---|
| 27 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
---|
| 28 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
---|
| 29 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
---|
| 30 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
---|
| 31 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
---|
| 32 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
---|
| 33 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
---|
| 34 | .\" SUCH DAMAGE.
|
---|
| 35 | .\"
|
---|
| 36 | .\" @(#)regex.3 8.4 (Berkeley) 3/20/94
|
---|
| 37 | .\"
|
---|
| 38 | .TH REGEX 3 "March 20, 1994"
|
---|
| 39 | .de ZR
|
---|
| 40 | .\" one other place knows this name: the SEE ALSO section
|
---|
| 41 | .BR re_format (7) \\$1
|
---|
| 42 | ..
|
---|
| 43 | .SH NAME
|
---|
| 44 | regex, regcomp, regexec, regerror, regfree \- regular-expression library
|
---|
| 45 | .SH SYNOPSIS
|
---|
| 46 | .ft B
|
---|
| 47 | .\".na
|
---|
| 48 | #include <sys/types.h>
|
---|
| 49 | .br
|
---|
| 50 | #include <regex.h>
|
---|
| 51 | .sp
|
---|
| 52 | .in +.5i
|
---|
| 53 | .ti -.5i
|
---|
| 54 | int regcomp(regex_t *\fIpreg\fP, const char *\fIpattern\fP, int \fIcflags\fP);
|
---|
| 55 | .ti -.5i
|
---|
| 56 | int regexec(const regex_t *\fIpreg\fP, const char *\fIstring\fP,
|
---|
| 57 | size_t \fInmatch\fP, regmatch_t \fIpmatch\fP[], int \fIeflags\fP);
|
---|
| 58 | .ti -.5i
|
---|
| 59 | size_t regerror(int \fIerrcode\fP, const regex_t *\fIpreg\fP,
|
---|
| 60 | char *\fIerrbuf\fP, size_t \fIerrbuf_size\fP);
|
---|
| 61 | .ti -.5i
|
---|
| 62 | void regfree(regex_t *\fIpreg\fP);
|
---|
| 63 | .in -.5i
|
---|
| 64 | .ft R
|
---|
| 65 | .SH DESCRIPTION
|
---|
| 66 | These routines implement POSIX 1003.2 regular expressions (``RE''s);
|
---|
| 67 | see
|
---|
| 68 | .ZR .
|
---|
| 69 | .B Regcomp
|
---|
| 70 | compiles an RE written as a string into an internal form,
|
---|
| 71 | .B regexec
|
---|
| 72 | matches that internal form against a string and reports results,
|
---|
| 73 | .B regerror
|
---|
| 74 | transforms error codes from either into human-readable messages,
|
---|
| 75 | and
|
---|
| 76 | .B regfree
|
---|
| 77 | frees any dynamically-allocated storage used by the internal form
|
---|
| 78 | of an RE.
|
---|
| 79 | .PP
|
---|
| 80 | The header
|
---|
| 81 | .I <regex.h>
|
---|
| 82 | declares two structure types,
|
---|
| 83 | .B regex_t
|
---|
| 84 | and
|
---|
| 85 | .BR regmatch_t ,
|
---|
| 86 | the former for compiled internal forms and the latter for match reporting.
|
---|
| 87 | It also declares the four functions,
|
---|
| 88 | a type
|
---|
| 89 | .BR regoff_t ,
|
---|
| 90 | and a number of constants with names starting with ``REG_''.
|
---|
| 91 | .PP
|
---|
| 92 | .B Regcomp
|
---|
| 93 | compiles the regular expression contained in the
|
---|
| 94 | .I pattern
|
---|
| 95 | string,
|
---|
| 96 | subject to the flags in
|
---|
| 97 | .IR cflags ,
|
---|
| 98 | and places the results in the
|
---|
| 99 | .B regex_t
|
---|
| 100 | structure pointed to by
|
---|
| 101 | .IR preg .
|
---|
| 102 | .I Cflags
|
---|
| 103 | is the bitwise OR of zero or more of the following flags:
|
---|
| 104 | .IP REG_EXTENDED \w'REG_EXTENDED'u+2n
|
---|
| 105 | Compile modern (``extended'') REs,
|
---|
| 106 | rather than the obsolete (``basic'') REs that
|
---|
| 107 | are the default.
|
---|
| 108 | .IP REG_BASIC
|
---|
| 109 | This is a synonym for 0,
|
---|
| 110 | provided as a counterpart to REG_EXTENDED to improve readability.
|
---|
| 111 | .IP REG_NOSPEC
|
---|
| 112 | Compile with recognition of all special characters turned off.
|
---|
| 113 | All characters are thus considered ordinary,
|
---|
| 114 | so the ``RE'' is a literal string.
|
---|
| 115 | This is an extension,
|
---|
| 116 | compatible with but not specified by POSIX 1003.2,
|
---|
| 117 | and should be used with
|
---|
| 118 | caution in software intended to be portable to other systems.
|
---|
| 119 | REG_EXTENDED and REG_NOSPEC may not be used
|
---|
| 120 | in the same call to
|
---|
| 121 | .IR regcomp .
|
---|
| 122 | .IP REG_ICASE
|
---|
| 123 | Compile for matching that ignores upper/lower case distinctions.
|
---|
| 124 | See
|
---|
| 125 | .ZR .
|
---|
| 126 | .IP REG_NOSUB
|
---|
| 127 | Compile for matching that need only report success or failure,
|
---|
| 128 | not what was matched.
|
---|
| 129 | .IP REG_NEWLINE
|
---|
| 130 | Compile for newline-sensitive matching.
|
---|
| 131 | By default, newline is a completely ordinary character with no special
|
---|
| 132 | meaning in either REs or strings.
|
---|
| 133 | With this flag,
|
---|
| 134 | `[^' bracket expressions and `.' never match newline,
|
---|
| 135 | a `^' anchor matches the null string after any newline in the string
|
---|
| 136 | in addition to its normal function,
|
---|
| 137 | and the `$' anchor matches the null string before any newline in the
|
---|
| 138 | string in addition to its normal function.
|
---|
| 139 | .IP REG_PEND
|
---|
| 140 | The regular expression ends,
|
---|
| 141 | not at the first NUL,
|
---|
| 142 | but just before the character pointed to by the
|
---|
| 143 | .B re_endp
|
---|
| 144 | member of the structure pointed to by
|
---|
| 145 | .IR preg .
|
---|
| 146 | The
|
---|
| 147 | .B re_endp
|
---|
| 148 | member is of type
|
---|
| 149 | .BR "const\ char\ *" .
|
---|
| 150 | This flag permits inclusion of NULs in the RE;
|
---|
| 151 | they are considered ordinary characters.
|
---|
| 152 | This is an extension,
|
---|
| 153 | compatible with but not specified by POSIX 1003.2,
|
---|
| 154 | and should be used with
|
---|
| 155 | caution in software intended to be portable to other systems.
|
---|
| 156 | .PP
|
---|
| 157 | When successful,
|
---|
| 158 | .B regcomp
|
---|
| 159 | returns 0 and fills in the structure pointed to by
|
---|
| 160 | .IR preg .
|
---|
| 161 | One member of that structure
|
---|
| 162 | (other than
|
---|
| 163 | .BR re_endp )
|
---|
| 164 | is publicized:
|
---|
| 165 | .BR re_nsub ,
|
---|
| 166 | of type
|
---|
| 167 | .BR size_t ,
|
---|
| 168 | contains the number of parenthesized subexpressions within the RE
|
---|
| 169 | (except that the value of this member is undefined if the
|
---|
| 170 | REG_NOSUB flag was used).
|
---|
| 171 | If
|
---|
| 172 | .B regcomp
|
---|
| 173 | fails, it returns a non-zero error code;
|
---|
| 174 | see DIAGNOSTICS.
|
---|
| 175 | .PP
|
---|
| 176 | .B Regexec
|
---|
| 177 | matches the compiled RE pointed to by
|
---|
| 178 | .I preg
|
---|
| 179 | against the
|
---|
| 180 | .IR string ,
|
---|
| 181 | subject to the flags in
|
---|
| 182 | .IR eflags ,
|
---|
| 183 | and reports results using
|
---|
| 184 | .IR nmatch ,
|
---|
| 185 | .IR pmatch ,
|
---|
| 186 | and the returned value.
|
---|
| 187 | The RE must have been compiled by a previous invocation of
|
---|
| 188 | .BR regcomp .
|
---|
| 189 | The compiled form is not altered during execution of
|
---|
| 190 | .BR regexec ,
|
---|
| 191 | so a single compiled RE can be used simultaneously by multiple threads.
|
---|
| 192 | .PP
|
---|
| 193 | By default,
|
---|
| 194 | the NUL-terminated string pointed to by
|
---|
| 195 | .I string
|
---|
| 196 | is considered to be the text of an entire line, minus any terminating
|
---|
| 197 | newline.
|
---|
| 198 | The
|
---|
| 199 | .I eflags
|
---|
| 200 | argument is the bitwise OR of zero or more of the following flags:
|
---|
| 201 | .IP REG_NOTBOL \w'REG_STARTEND'u+2n
|
---|
| 202 | The first character of
|
---|
| 203 | the string
|
---|
| 204 | is not the beginning of a line, so the `^' anchor should not match before it.
|
---|
| 205 | This does not affect the behavior of newlines under REG_NEWLINE.
|
---|
| 206 | .IP REG_NOTEOL
|
---|
| 207 | The NUL terminating
|
---|
| 208 | the string
|
---|
| 209 | does not end a line, so the `$' anchor should not match before it.
|
---|
| 210 | This does not affect the behavior of newlines under REG_NEWLINE.
|
---|
| 211 | .IP REG_STARTEND
|
---|
| 212 | The string is considered to start at
|
---|
| 213 | \fIstring\fR\ + \fIpmatch\fR[0].\fBrm_so\fR
|
---|
| 214 | and to have a terminating NUL located at
|
---|
| 215 | \fIstring\fR\ + \fIpmatch\fR[0].\fBrm_eo\fR
|
---|
| 216 | (there need not actually be a NUL at that location),
|
---|
| 217 | regardless of the value of
|
---|
| 218 | .IR nmatch .
|
---|
| 219 | See below for the definition of
|
---|
| 220 | .IR pmatch
|
---|
| 221 | and
|
---|
| 222 | .IR nmatch .
|
---|
| 223 | This is an extension,
|
---|
| 224 | compatible with but not specified by POSIX 1003.2,
|
---|
| 225 | and should be used with
|
---|
| 226 | caution in software intended to be portable to other systems.
|
---|
| 227 | Note that a non-zero \fBrm_so\fR does not imply REG_NOTBOL;
|
---|
| 228 | REG_STARTEND affects only the location of the string,
|
---|
| 229 | not how it is matched.
|
---|
| 230 | .PP
|
---|
| 231 | See
|
---|
| 232 | .ZR
|
---|
| 233 | for a discussion of what is matched in situations where an RE or a
|
---|
| 234 | portion thereof could match any of several substrings of
|
---|
| 235 | .IR string .
|
---|
| 236 | .PP
|
---|
| 237 | Normally,
|
---|
| 238 | .B regexec
|
---|
| 239 | returns 0 for success and the non-zero code REG_NOMATCH for failure.
|
---|
| 240 | Other non-zero error codes may be returned in exceptional situations;
|
---|
| 241 | see DIAGNOSTICS.
|
---|
| 242 | .PP
|
---|
| 243 | If REG_NOSUB was specified in the compilation of the RE,
|
---|
| 244 | or if
|
---|
| 245 | .I nmatch
|
---|
| 246 | is 0,
|
---|
| 247 | .B regexec
|
---|
| 248 | ignores the
|
---|
| 249 | .I pmatch
|
---|
| 250 | argument (but see below for the case where REG_STARTEND is specified).
|
---|
| 251 | Otherwise,
|
---|
| 252 | .I pmatch
|
---|
| 253 | points to an array of
|
---|
| 254 | .I nmatch
|
---|
| 255 | structures of type
|
---|
| 256 | .BR regmatch_t .
|
---|
| 257 | Such a structure has at least the members
|
---|
| 258 | .B rm_so
|
---|
| 259 | and
|
---|
| 260 | .BR rm_eo ,
|
---|
| 261 | both of type
|
---|
| 262 | .B regoff_t
|
---|
| 263 | (a signed arithmetic type at least as large as an
|
---|
| 264 | .B off_t
|
---|
| 265 | and a
|
---|
| 266 | .BR ssize_t ),
|
---|
| 267 | containing respectively the offset of the first character of a substring
|
---|
| 268 | and the offset of the first character after the end of the substring.
|
---|
| 269 | Offsets are measured from the beginning of the
|
---|
| 270 | .I string
|
---|
| 271 | argument given to
|
---|
| 272 | .BR regexec .
|
---|
| 273 | An empty substring is denoted by equal offsets,
|
---|
| 274 | both indicating the character following the empty substring.
|
---|
| 275 | .PP
|
---|
| 276 | The 0th member of the
|
---|
| 277 | .I pmatch
|
---|
| 278 | array is filled in to indicate what substring of
|
---|
| 279 | .I string
|
---|
| 280 | was matched by the entire RE.
|
---|
| 281 | Remaining members report what substring was matched by parenthesized
|
---|
| 282 | subexpressions within the RE;
|
---|
| 283 | member
|
---|
| 284 | .I i
|
---|
| 285 | reports subexpression
|
---|
| 286 | .IR i ,
|
---|
| 287 | with subexpressions counted (starting at 1) by the order of their opening
|
---|
| 288 | parentheses in the RE, left to right.
|
---|
| 289 | Unused entries in the array\(emcorresponding either to subexpressions that
|
---|
| 290 | did not participate in the match at all, or to subexpressions that do not
|
---|
| 291 | exist in the RE (that is, \fIi\fR\ > \fIpreg\fR\->\fBre_nsub\fR)\(emhave both
|
---|
| 292 | .B rm_so
|
---|
| 293 | and
|
---|
| 294 | .B rm_eo
|
---|
| 295 | set to \-1.
|
---|
| 296 | If a subexpression participated in the match several times,
|
---|
| 297 | the reported substring is the last one it matched.
|
---|
| 298 | (Note, as an example in particular, that when the RE `(b*)+' matches `bbb',
|
---|
| 299 | the parenthesized subexpression matches each of the three `b's and then
|
---|
| 300 | an infinite number of empty strings following the last `b',
|
---|
| 301 | so the reported substring is one of the empties.)
|
---|
| 302 | .PP
|
---|
| 303 | If REG_STARTEND is specified,
|
---|
| 304 | .I pmatch
|
---|
| 305 | must point to at least one
|
---|
| 306 | .B regmatch_t
|
---|
| 307 | (even if
|
---|
| 308 | .I nmatch
|
---|
| 309 | is 0 or REG_NOSUB was specified),
|
---|
| 310 | to hold the input offsets for REG_STARTEND.
|
---|
| 311 | Use for output is still entirely controlled by
|
---|
| 312 | .IR nmatch ;
|
---|
| 313 | if
|
---|
| 314 | .I nmatch
|
---|
| 315 | is 0 or REG_NOSUB was specified,
|
---|
| 316 | the value of
|
---|
| 317 | .IR pmatch [0]
|
---|
| 318 | will not be changed by a successful
|
---|
| 319 | .BR regexec .
|
---|
| 320 | .PP
|
---|
| 321 | .B Regerror
|
---|
| 322 | maps a non-zero
|
---|
| 323 | .I errcode
|
---|
| 324 | from either
|
---|
| 325 | .B regcomp
|
---|
| 326 | or
|
---|
| 327 | .B regexec
|
---|
| 328 | to a human-readable, printable message.
|
---|
| 329 | If
|
---|
| 330 | .I preg
|
---|
| 331 | is non-NULL,
|
---|
| 332 | the error code should have arisen from use of
|
---|
| 333 | the
|
---|
| 334 | .B regex_t
|
---|
| 335 | pointed to by
|
---|
| 336 | .IR preg ,
|
---|
| 337 | and if the error code came from
|
---|
| 338 | .BR regcomp ,
|
---|
| 339 | it should have been the result from the most recent
|
---|
| 340 | .B regcomp
|
---|
| 341 | using that
|
---|
| 342 | .BR regex_t .
|
---|
| 343 | .RI ( Regerror
|
---|
| 344 | may be able to supply a more detailed message using information
|
---|
| 345 | from the
|
---|
| 346 | .BR regex_t .)
|
---|
| 347 | .B Regerror
|
---|
| 348 | places the NUL-terminated message into the buffer pointed to by
|
---|
| 349 | .IR errbuf ,
|
---|
| 350 | limiting the length (including the NUL) to at most
|
---|
| 351 | .I errbuf_size
|
---|
| 352 | bytes.
|
---|
| 353 | If the whole message won't fit,
|
---|
| 354 | as much of it as will fit before the terminating NUL is supplied.
|
---|
| 355 | In any case,
|
---|
| 356 | the returned value is the size of buffer needed to hold the whole
|
---|
| 357 | message (including terminating NUL).
|
---|
| 358 | If
|
---|
| 359 | .I errbuf_size
|
---|
| 360 | is 0,
|
---|
| 361 | .I errbuf
|
---|
| 362 | is ignored but the return value is still correct.
|
---|
| 363 | .PP
|
---|
| 364 | If the
|
---|
| 365 | .I errcode
|
---|
| 366 | given to
|
---|
| 367 | .B regerror
|
---|
| 368 | is first ORed with REG_ITOA,
|
---|
| 369 | the ``message'' that results is the printable name of the error code,
|
---|
| 370 | e.g. ``REG_NOMATCH'',
|
---|
| 371 | rather than an explanation thereof.
|
---|
| 372 | If
|
---|
| 373 | .I errcode
|
---|
| 374 | is REG_ATOI,
|
---|
| 375 | then
|
---|
| 376 | .I preg
|
---|
| 377 | shall be non-NULL and the
|
---|
| 378 | .B re_endp
|
---|
| 379 | member of the structure it points to
|
---|
| 380 | must point to the printable name of an error code;
|
---|
| 381 | in this case, the result in
|
---|
| 382 | .I errbuf
|
---|
| 383 | is the decimal digits of
|
---|
| 384 | the numeric value of the error code
|
---|
| 385 | (0 if the name is not recognized).
|
---|
| 386 | REG_ITOA and REG_ATOI are intended primarily as debugging facilities;
|
---|
| 387 | they are extensions,
|
---|
| 388 | compatible with but not specified by POSIX 1003.2,
|
---|
| 389 | and should be used with
|
---|
| 390 | caution in software intended to be portable to other systems.
|
---|
| 391 | Be warned also that they are considered experimental and changes are possible.
|
---|
| 392 | .PP
|
---|
| 393 | .B Regfree
|
---|
| 394 | frees any dynamically-allocated storage associated with the compiled RE
|
---|
| 395 | pointed to by
|
---|
| 396 | .IR preg .
|
---|
| 397 | The remaining
|
---|
| 398 | .B regex_t
|
---|
| 399 | is no longer a valid compiled RE
|
---|
| 400 | and the effect of supplying it to
|
---|
| 401 | .B regexec
|
---|
| 402 | or
|
---|
| 403 | .B regerror
|
---|
| 404 | is undefined.
|
---|
| 405 | .PP
|
---|
| 406 | None of these functions references global variables except for tables
|
---|
| 407 | of constants;
|
---|
| 408 | all are safe for use from multiple threads if the arguments are safe.
|
---|
| 409 | .SH IMPLEMENTATION CHOICES
|
---|
| 410 | There are a number of decisions that 1003.2 leaves up to the implementor,
|
---|
| 411 | either by explicitly saying ``undefined'' or by virtue of them being
|
---|
| 412 | forbidden by the RE grammar.
|
---|
| 413 | This implementation treats them as follows.
|
---|
| 414 | .PP
|
---|
| 415 | See
|
---|
| 416 | .ZR
|
---|
| 417 | for a discussion of the definition of case-independent matching.
|
---|
| 418 | .PP
|
---|
| 419 | There is no particular limit on the length of REs,
|
---|
| 420 | except insofar as memory is limited.
|
---|
| 421 | Memory usage is approximately linear in RE size, and largely insensitive
|
---|
| 422 | to RE complexity, except for bounded repetitions.
|
---|
| 423 | See BUGS for one short RE using them
|
---|
| 424 | that will run almost any system out of memory.
|
---|
| 425 | .PP
|
---|
| 426 | A backslashed character other than one specifically given a magic meaning
|
---|
| 427 | by 1003.2 (such magic meanings occur only in obsolete [``basic''] REs)
|
---|
| 428 | is taken as an ordinary character.
|
---|
| 429 | .PP
|
---|
| 430 | Any unmatched [ is a REG_EBRACK error.
|
---|
| 431 | .PP
|
---|
| 432 | Equivalence classes cannot begin or end bracket-expression ranges.
|
---|
| 433 | The endpoint of one range cannot begin another.
|
---|
| 434 | .PP
|
---|
| 435 | RE_DUP_MAX, the limit on repetition counts in bounded repetitions, is 255.
|
---|
| 436 | .PP
|
---|
| 437 | A repetition operator (?, *, +, or bounds) cannot follow another
|
---|
| 438 | repetition operator.
|
---|
| 439 | A repetition operator cannot begin an expression or subexpression
|
---|
| 440 | or follow `^' or `|'.
|
---|
| 441 | .PP
|
---|
| 442 | `|' cannot appear first or last in a (sub)expression or after another `|',
|
---|
| 443 | i.e. an operand of `|' cannot be an empty subexpression.
|
---|
| 444 | An empty parenthesized subexpression, `()', is legal and matches an
|
---|
| 445 | empty (sub)string.
|
---|
| 446 | An empty string is not a legal RE.
|
---|
| 447 | .PP
|
---|
| 448 | A `{' followed by a digit is considered the beginning of bounds for a
|
---|
| 449 | bounded repetition, which must then follow the syntax for bounds.
|
---|
| 450 | A `{' \fInot\fR followed by a digit is considered an ordinary character.
|
---|
| 451 | .PP
|
---|
| 452 | `^' and `$' beginning and ending subexpressions in obsolete (``basic'')
|
---|
| 453 | REs are anchors, not ordinary characters.
|
---|
| 454 | .SH SEE ALSO
|
---|
| 455 | .BR grep (1),
|
---|
| 456 | .BR re_format (7).
|
---|
| 457 | .PP
|
---|
| 458 | POSIX 1003.2, sections 2.8 (Regular Expression Notation)
|
---|
| 459 | and
|
---|
| 460 | B.5 (C Binding for Regular Expression Matching).
|
---|
| 461 | .SH DIAGNOSTICS
|
---|
| 462 | Non-zero error codes from
|
---|
| 463 | .B regcomp
|
---|
| 464 | and
|
---|
| 465 | .B regexec
|
---|
| 466 | include the following:
|
---|
| 467 | .PP
|
---|
| 468 | .nf
|
---|
| 469 | .ta \w'REG_ECOLLATE'u+3n
|
---|
| 470 | REG_NOMATCH regexec() failed to match
|
---|
| 471 | REG_BADPAT invalid regular expression
|
---|
| 472 | REG_ECOLLATE invalid collating element
|
---|
| 473 | REG_ECTYPE invalid character class
|
---|
| 474 | REG_EESCAPE \e applied to unescapable character
|
---|
| 475 | REG_ESUBREG invalid backreference number
|
---|
| 476 | REG_EBRACK brackets [ ] not balanced
|
---|
| 477 | REG_EPAREN parentheses ( ) not balanced
|
---|
| 478 | REG_EBRACE braces { } not balanced
|
---|
| 479 | REG_BADBR invalid repetition count(s) in { }
|
---|
| 480 | REG_ERANGE invalid character range in [ ]
|
---|
| 481 | REG_ESPACE ran out of memory
|
---|
| 482 | REG_BADRPT ?, *, or + operand invalid
|
---|
| 483 | REG_EMPTY empty (sub)expression
|
---|
| 484 | REG_ASSERT ``can't happen''\(emyou found a bug
|
---|
| 485 | REG_INVARG invalid argument, e.g. negative-length string
|
---|
| 486 | .fi
|
---|
| 487 | .SH HISTORY
|
---|
| 488 | Originally written by Henry Spencer.
|
---|
| 489 | Altered for inclusion in the 4.4BSD distribution.
|
---|
| 490 | .SH BUGS
|
---|
| 491 | This is an alpha release with known defects.
|
---|
| 492 | Please report problems.
|
---|
| 493 | .PP
|
---|
| 494 | There is one known functionality bug.
|
---|
| 495 | The implementation of internationalization is incomplete:
|
---|
| 496 | the locale is always assumed to be the default one of 1003.2,
|
---|
| 497 | and only the collating elements etc. of that locale are available.
|
---|
| 498 | .PP
|
---|
| 499 | The back-reference code is subtle and doubts linger about its correctness
|
---|
| 500 | in complex cases.
|
---|
| 501 | .PP
|
---|
| 502 | .B Regexec
|
---|
| 503 | performance is poor.
|
---|
| 504 | This will improve with later releases.
|
---|
| 505 | .I Nmatch
|
---|
| 506 | exceeding 0 is expensive;
|
---|
| 507 | .I nmatch
|
---|
| 508 | exceeding 1 is worse.
|
---|
| 509 | .B Regexec
|
---|
| 510 | is largely insensitive to RE complexity \fIexcept\fR that back
|
---|
| 511 | references are massively expensive.
|
---|
| 512 | RE length does matter; in particular, there is a strong speed bonus
|
---|
| 513 | for keeping RE length under about 30 characters,
|
---|
| 514 | with most special characters counting roughly double.
|
---|
| 515 | .PP
|
---|
| 516 | .B Regcomp
|
---|
| 517 | implements bounded repetitions by macro expansion,
|
---|
| 518 | which is costly in time and space if counts are large
|
---|
| 519 | or bounded repetitions are nested.
|
---|
| 520 | An RE like, say,
|
---|
| 521 | `((((a{1,100}){1,100}){1,100}){1,100}){1,100}'
|
---|
| 522 | will (eventually) run almost any existing machine out of swap space.
|
---|
| 523 | .PP
|
---|
| 524 | There are suspected problems with response to obscure error conditions.
|
---|
| 525 | Notably,
|
---|
| 526 | certain kinds of internal overflow,
|
---|
| 527 | produced only by truly enormous REs or by multiply nested bounded repetitions,
|
---|
| 528 | are probably not handled well.
|
---|
| 529 | .PP
|
---|
| 530 | Due to a mistake in 1003.2, things like `a)b' are legal REs because `)' is
|
---|
| 531 | a special character only in the presence of a previous unmatched `('.
|
---|
| 532 | This can't be fixed until the spec is fixed.
|
---|
| 533 | .PP
|
---|
| 534 | The standard's definition of back references is vague.
|
---|
| 535 | For example, does
|
---|
| 536 | `a\e(\e(b\e)*\e2\e)*d' match `abbbd'?
|
---|
| 537 | Until the standard is clarified,
|
---|
| 538 | behavior in such cases should not be relied on.
|
---|
| 539 | .PP
|
---|
| 540 | The implementation of word-boundary matching is a bit of a kludge,
|
---|
| 541 | and bugs may lurk in combinations of word-boundary matching and anchoring.
|
---|