regcomp(3C)regcomp(3C)NAMEregcomp(), regerror(), regexec(), regfree() - regular expression match‐
ing routines
SYNOPSISDESCRIPTION
These functions interpret regular expressions as described in reg‐
exp(5). They support both basic and extended regular expressions.
The structures and are defined in the header
The structure contains at least the following member (use of other mem‐
bers results in non-portable code):
Number of parenthesized subexpressions.
The structure contains at least the following members:
Byte offset from start of string to start of substring.
Byte offset from start of string to the first character after
the end of
the substring.
compiles the regular expression specified by the pattern argument and
places the results in the structure pointed to by preg. The cflags
argument is the bit-wise logical OR of zero or more of the following
flags (defined in
Use extended regular expressions.
If is not set in cflags, a newline character in
pattern or string is treated as an ordinary
character. If is set, newlines are treated as
ordinary characters except as follows:
1. A newline in string is not matched by a
period outside of a bracket expression or
by any form of a nonmatching list.
2. A circumflex in pattern, when used to
specify expression anchoring, matches the
zero-length string immediately after a
newline in string, regardless of the set‐
ting of
3. A dollar-sign in pattern, when used to
specify expression anchoring, matches the
zero-length string immediately before a
newline in string, regardless of the set‐
ting of
Ignore case in match.
If a character in pattern is defined in the
current locale as having one or more opposite-
case counterpoints, both the character and any
counterpoints match the pattern character.
This applies to all portions of the pattern,
including a string of characters specified to
be matched via a back-reference expression
Within bracket expressions: Collation ranges,
character classes, and equivalence classes are
effectively expanded into equivalent lists of
collation elements and characters. Opposite-
case counterpoints are then generated for each
collation element or character to form the
complete matching list or non-matching list
for the bracket expression. Opposite-case
counterpoints for a multi-character collating
element include all possible combinations of
opposite-case counterpoints for each individ‐
ual character comprising the collating ele‐
ment. These are then combined to form new
valid multi-character collating elements. For
example, the opposite-case counterpoints for
could be and
The default regular expression type for pattern is Basic Regular
Expression. The application can specify Extended Regular Expressions
by using the cflags value.
If the function succeeds, it returns zero; otherwise it returns a non-
zero value indicating the error.
If succeeds, and if the flag was not set in cflags, sets to the number
of parenthesized subexpressions (delimited by and in basic regular
expressions or and in extended regular expressions) found in pattern.
matches the null-terminated string specified by string against the com‐
piled regular expression preg initialized by a previous call to If it
finds a match, returns zero; otherwise it returns non-zero indicating
either no match or an error. The eflags argument is the bit-wise logi‐
cal OR of the following flags:
The first character of the string pointed to by
string is not the beginning of the line.
Therefore, the circumflex character when taken
as a special character, never matches.
The last character of the string pointed to by
string is not the end of the line. Therefore,
the dollar sign when taken as a special char‐
acter, never matches.
If nmatch is not zero, and was not set in the cflags argument to then
fills in the pmatch array with byte offsets to the substrings of string
that correspond to the parenthesized subexpressions of pattern:
pmatch[i].rm_so is the byte offset of the beginning and pmatch[i].rm_eo
is the byte offset one byte past the end of the substring i. (Subex‐
pression i begins at the ith matched left parenthesis, counting from
1). Offsets in pmatch[0] identify the substring that corresponds to
the entire regular expression. Unused elements of pmatch are set to
−1. If there are more than nmatch subexpressions in pattern (pattern
itself counts as a subexpression), still does the match, but only
records the first nmatch substrings.
When matching a regular expression, any given parenthesized subexpres‐
sion of pattern might participate in the match of several different
substrings of string, or it might not match any substring, even though
the pattern as a whole did match. The following explains which sub‐
strings are reported in pmatch when matching regular expressions:
1. If subexpression i in a regular expression is not con‐
tained within another subexpression, and it participated
in the match several times, the byte offsets in pmatch[i]
delimit the last such match.
2. If subexpression i is not contained within another subex‐
pression, and it did not participate in an otherwise suc‐
cessful match (because either or was used), then the byte
offsets in pmatch[i] are −1.
3. If subexpression i is contained in subexpression j, and a
match of subexpression j is reported in pmatch[j], the
match or no-match reported in pmatch[i] is the last one
that occurred within the substring in pmatch[j].
4. If subexpression i is contained in subexpression j, and
the offsets in pmatch[j] are −1, the offsets in pmatch[i]
will also be −1.
5. If subexpression i matched a zero-length string, both
offsets in pmatch[i] refer to the character immediately
following the zero-length substring.
If was set in cflags in the call to and nmatch is not zero in the call
to the content of the pmatch array is unspecified.
frees any memory allocated by associated with preg.
If the preg argument to or is not a compiled regular expression
returned by the result is undefined. A preg can no longer be treated
as a compiled regular expression after it is given to
provides a mapping from error codes returned by and to printable
strings. generates a string corresponding to the value of the errcode
parameter, which was the last non-zero value returned by or with the
given value of preg. The errcode parameter can take on any of the
error values defined in If errbuf_size is not zero, copies an appropri‐
ate error message into the buffer specified by errbuf. If the error
message (including the terminating null) cannot fit in the buffer, it
is truncated to errbuf_size − 1 bytes and null terminated.
If errbuf_size is zero, the errbuf parameter is ignored, but the return
value is as defined below.
returns the size of the buffer (including terminating null) that is
required to hold the entire error message.
EXTERNAL INFLUENCES
Locale
The category determines the collating sequence used in compiling and
executing regular expressions.
The category determines the interpretation of text as single and/or
multi-byte characters, the characters matched by character-class
expressions in regular expressions, and the opposite-case counterpart
for each character.
International Code Set Support
Single- and multi-byte character code sets are supported. However, if
the and variables specify locale categories that are not based upon the
same underlying codeset, the results of is undefined.
RETURN VALUE
returns zero for success and non-zero for an invalid expression or
other failure. returns zero if it finds a match and non-zero for no
match or other failure.
ERRORS
If or detects one of the error conditions listed below, it returns the
corresponding non-zero error code. The error codes are defined in the
header
The contents within the pair
(backslash left brace) and (backslash right
brace) are unusable: not a number, number
too large, more than two numbers, or first
number larger than second.
An invalid regular expression.
The (question mark), (asterisk), or (plus sign)
symbols are not preceded by a valid regular
expression.
The use of a pair of
(backslash left brace) and (backslash right
brace) or (braces) is unbalanced.
The use of (brackets) is unbalanced.
Using the (caret) anchor and not beginning of line.
There is an invalid multibyte character.
There is an unusable collating element referenced.
There is an unusable character class type referenced.
Using the (dollar) anchor and not end of line.
There is a trailing in the pattern.
The use of a pair of
(backslash left parenthesis) and (backslash
right parenthesis) or is unbalanced.
There is an unusable endpoint in the range expression.
There is insufficient memory space.
The number in is invalid or in error.
The function failed to match.
EXAMPLES
/* match string against the extended regular expression in pattern,
treating errors as no match. Return 1 for match, 0 for no match.
Print an error message if an error occurs. */
int
match(string, pattern)
char *string;
char *pattern;
{
int i;
regex_t re;
char buf[256];
i=regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB);
if (i != 0) {
(void)regerror(i,&re,buf,sizeof buf);
printf("%s\n",buf);
return(0); /* report error */
}
i = regexec(&re, string, (size_t) 0, NULL, 0);
regfree(&re);
if (i != 0) {
(void)regerror(i,&re,buf,sizeof buf);
printf("%s\n",buf);
return(0); /* report error */
}
return(1);
}
The following demonstrates how the flag could be used with to find all
substrings in a line that match a pattern supplied by a user.
(void) regcomp(&re, pattern, 0);
/* look for first match at start of line */
error = regexec(&re, &buffer[0], 1, &pm, 0);
while (error == 0) { /* while matches found */
/* find next match on line */
error = regexec(&re, &buffer[pm.rm_eo], 1, &pm, REG_NOTBOL);
}
AUTHOR
and were developed by OSF and HP.
SEE ALSOregexp(5).
STANDARDS CONFORMANCEregcomp(3C)