regexp man page on HP-UX

Man page or keyword search:  
man Server   10987 pages
apropos Keyword Search (all sections)
Output format
HP-UX logo
[printable version]

regexp(5)							     regexp(5)

NAME
       regexp - regular expression and pattern matching notation definitions

DESCRIPTION
       A  is  a mechanism supported by many utilities for locating and manipu‐
       lating patterns in text.	 is used by shells  and	 other	utilities  for
       file  name  expansion.	This manual entry defines two forms of regular
       expressions: and and the one form of

BASIC REGULAR EXPRESSIONS
       Basic regular expression (RE) notation and construction rules apply  to
       utilities  defined as using basic REs.  Any exceptions to the following
       rules are noted in the descriptions of the specific utilities that  use
       REs.

   REs Matching a Single Character
       The  following  REs match a single character or a single collating ele‐
       ment: An ordinary character is an RE that matches itself.  An  ordinary
       character  is  any character in the supported character set except new‐
       line and the regular expression special characters  listed  in  Special
       Characters  below.   An	ordinary  character preceded by a backslash is
       treated as the ordinary character itself, except when the character  is
       or  or  the  digits  through  (see  REs	Matching Multiple Characters).
       Matching is based on the bit pattern used for encoding  the  character;
       not  on the graphic representation of the character.  A regular expres‐
       sion special character preceded by a backslash is a regular  expression
       that  matches  the  special  character  itself.	When not preceded by a
       backslash, such characters have special meaning in the specification of
       REs.   Regular  expression special characters and the contexts in which
       they have special meaning are:

	      The period, left square bracket, and backslash are special
			     except when used in a bracket expression (see  RE
			     Bracket Expression).

	      The  asterisk  is	 special except when used in a bracket expres‐
	      sion,
			     as the first character of a  regular  expression,
			     or as the first character following the character
			     pair (see REs Matching Multiple Characters).

	      The circumflex is special when used as the first character
			     of an entire RE (see Expression Anchoring) or  as
			     the first character of a bracket expression.

	      The dollar sign is special when used as the last character of an
	      entire RE
			     (see Expression Anchoring).

	      delimiter	     Any character used to bound  (i.e.,  delimit)  an
			     entire RE is special for that RE.
       A  period  when	used  outside  of  a bracket expression, is an RE that
       matches any printable or nonprintable character except newline.

   RE Bracket Expression
       A bracket expression enclosed in square brackets is an RE that  matches
       a  single  collating element contained in the nonempty set of collating
       elements represented by the bracket expression.

       The following rules apply to bracket expressions:

	    A bracket expression is either a
			   or a and consists of one or more expressions in any
			   order.   Expressions	 can  be:  collating elements,
			   collating symbols, noncollating characters, equiva‐
			   lence  classes,  range  expressions,	 or  character
			   classes.  The right bracket loses its special mean‐
			   ing	and  represents itself in a bracket expression
			   if it occurs first in the list (after an initial if
			   any).  Otherwise, it terminates the bracket expres‐
			   sion (unless it is the ending right bracket	for  a
			   valid collating symbol, equivalence class, or char‐
			   acter class, or it is the collating element	within
			   a  collating	 symbol	 or  equivalence class expres‐
			   sion).  The special characters

			   (period, asterisk,  left  bracket,  and  backslash)
			   lose their special meaning within a bracket expres‐
			   sion.

			   The character sequences:

			   (left-bracket followed by a period,	equal-sign  or
			   colon)  are special inside a bracket expression and
			   are used to delimit collating symbols,  equivalence
			   class  expressions and character class expressions.
			   These symbols must be followed by a	valid  expres‐
			   sion and the matching terminating or

	    A  matching	 list expression specifies a list that matches any one
	    of the
			   characters represented  in  the  list.   The	 first
			   character  in  the  list  cannot be the circumflex.
			   For example, is an RE that matches any of or

	    A		   expression begins with a circumflex and specifies a
			   list	 that  matches any character or collating ele‐
			   ment except newline and the characters  represented
			   in  the  list.   For example, is an RE that matches
			   any character except newline or or  The  circumflex
			   has	this  special  meaning when it occurs first in
			   the list, immediately  following  the  left	square
			   bracket.

	    A		   is a sequence of one or more characters that repre‐
			   sents a single element in the collating sequence as
			   identified  via  the	 most  current	setting of the
			   locale variable (see setlocale(3C)).

	    A		   is a collating  element  enclosed  within  bracket-
			   period  delimiters.	 Multicharacter collating ele‐
			   ments must be represented as collating  symbols  to
			   distinguish	them  from  single-character collating
			   elements.  For example, if the string  is  a	 valid
			   collating  element,	then  is treated as an element
			   matching the same string of	characters,  while  is
			   treated  as	a simple list of the characters and If
			   the string within the bracket-period delimiters  is
			   not	a  valid collating element in the current col‐
			   lating sequence definition, the symbol  is  treated
			   as an invalid expression.

	    A		   is  a  character that is ignored for collating pur‐
			   poses.  By definition, such characters cannot  par‐
			   ticipate  in	 equivalence  classes or range expres‐
			   sions.

	    An		   expression represents the set of collating elements
			   belonging to an equivalence class.  It is expressed
			   by enclosing any one of the collating  elements  in
			   the	equivalence  class within bracket-equal delim‐
			   iters.  For example, if  and	 belong	 to  the  same
			   equivalence class, then and are each equivalent to

	    A		   represents  the set of collating elements that fall
			   between  two	 elements  in  the  current  collation
			   sequence as defined via the most current setting of
			   the locale variable	(see  setlocale(3C)).	It  is
			   expressed  as  the  starting	 point	and the ending
			   point separated by a hyphen

			   The starting range point and the ending range point
			   must	 be  a collating element, collating symbol, or
			   equivalence class expression.  An equivalence class
			   expression  used as an end point of a range expres‐
			   sion is interpreted such that  all  collating  ele‐
			   ments  within the equivalence class are included in
			   the range.  For example, if the collating order  is
			   and	and  the  characters  and  belong  to the same
			   equivalence class, then the expression  is  treated
			   as

			   Both starting and ending range points must be valid
			   collating elements, collating symbols,  or  equiva‐
			   lence class expressions, and the ending range point
			   must collate equal to or higher than	 the  starting
			   range  point;  otherwise the expression is invalid.
			   For example, with the  above	 collating  order  and
			   assuming  that  is  a  noncollating character, then
			   both the expressions and are invalid.

			   An ending range point  can  also  be	 the  starting
			   range point in a subsequent range expression.  Each
			   such range expression is evaluated separately.  For
			   example, the bracket expression is treated as

			   The	hyphen	character  is  treated as itself if it
			   occurs first (after an initial if any) or  last  in
			   the	list,  or  as  the rightmost symbol in a range
			   expression.	As examples, the expressions  and  are
			   equivalent  and  match any of the characters or the
			   expressions and are equivalent and match any	 char‐
			   acters  except  newline,  or the expression matches
			   any of the  characters  in  the  defined  collating
			   sequence  between  and  inclusive;  the  expression
			   matches any of the characters in the	 defined  col‐
			   lating  sequence  between  and  inclusive;  and the
			   expression is invalid,  assuming  precedes  in  the
			   collating sequence.

			   If  a  bracket expression must specify both and the
			   must be placed first (after the  if	any)  and  the
			   last within the bracket expression.

	    A  character  class	 expression  represents	 the set of characters
	    belonging
			   to a character class, as defined via the most  cur‐
			   rent setting of the locale variable It is expressed
			   as a character class name enclosed within  bracket-
			   colon delimiters.

			   Standard  character	class expressions supported in
			   all locales are:

				letters

				upper-case letters

				lower-case letters

				decimal digits

				hexadecimal digits

				letters or decimal digits

				characters producing white-space in  displayed
				text

				printing characters

				punctuation characters

				characters with a visible representation

				control characters

				blank characters

			   For	example,  if the locale variable is set to the
			   expression is equivalent to Similarly  the  expres‐
			   sion is same as

   REs Matching Multiple Characters
       The  following  rules  may  be  used to construct REs matching multiple
       characters from REs matching a single character:

	    RERE	   The concatenation of REs is an RE that matches  the
			   first  encountered  concatenation  of  the  strings
			   matched by each component of the RE.	 For  example,
			   the	RE  matches the second and third characters of
			   the string

	    An RE matching a single character followed by an asterisk
			   is an RE that matches zero or more  occurrences  of
			   the	RE  preceding the asterisk.  The first encoun‐
			   tered string that permits a match  is  chosen,  and
			   the	matched string will encompass the maximum num‐
			   ber of characters permitted by the RE.   For	 exam‐
			   ple,	 in  the  string  both	the  RE and the RE are
			   matched by the  substring  in  the  second  through
			   fifth  positions.  An asterisk as the first charac‐
			   ter of an RE loses  this  special  meaning  and  is
			   treated as itself.

	    A subexpression can be defined within an RE
			   by  enclosing  it  between  the character pairs and
			   Such a subexpression matches whatever it would have
			   matched without the and Subexpressions can be arbi‐
			   trarily nested.  An asterisk immediately  following
			   the	loses  its  special  meaning and is treated as
			   itself.  An asterisk immediately following  the  is
			   treated as an invalid character.

	    The expression matches  the	 same  string  of  characters  as  was
			   matched by a	 subexpression	enclosed  between  and
			   preceding  the The character n must be a digit from
			   through specifying the n-th subexpression (the  one
			   that	 begins with the n-th and ends with the corre‐
			   sponding paired For example, the expression matches
			   a  line  consisting	of two adjacent appearances of
			   the same string.

			   If the is followed by an asterisk, it matches  zero
			   or  more  occurrences of the subexpression referred
			   to.	For example, the expression matches the string

	    An RE matching a single character followed by
			   or is an RE that matches  repeated  occurrences  of
			   the	RE.   The  values  of  m and n must be decimal
			   integers in the range 0 through 255, with m	speci‐
			   fying  the  exact  or minimum number of occurrences
			   and n specifying the maximum number of occurrences.
			   matches  exactly m occurrences of the preceding RE,
			   matches at least m  occurrences,  and  matches  any
			   number of occurrences between m and n, inclusive.

			   The	first  encountered  string  that  matches  the
			   expression is  chosen;  it  will  contain  as  many
			   occurrences of the RE as possible.  For example, in
			   the string the RE  is  matched  by  characters  two
			   through  four,  the RE is matched by characters two
			   through eight, and the RE is matched by  characters
			   four through nine.

   Expression Anchoring
       An  RE  can  be	limited	 to  matching strings that begin or end a line
       (i.e., anchored) according to the following rules:

	    ·  A circumflex as the  first  character  of  an  RE  anchors  the
	       expression to the beginning of a line; only strings starting at
	       the first character of a line are matched by the RE.  For exam‐
	       ple,  the  RE  matches  the string in the line but not the same
	       string in the line

	    ·  A dollar sign as the  last  character  of  an  RE  anchors  the
	       expression  to  the  end	 of a line; only strings ending at the
	       last character of a line are matched by the RE.	 For  example,
	       the  RE	matches the string in the line but not the same string
	       in the line

	    ·  An RE anchored by both and matches only strings that are lines.
	       For example, the RE matches only lines consisting of the string

       The use of duplication characters (+,*) following anchors is illegal.

EXTENDED REGULAR EXPRESSIONS
       The  extended  regular expression (ERE) notation and construction rules
       apply to utilities defined as using extended REs.   Any	exceptions  to
       the following rules are noted in the descriptions of the specific util‐
       ities using EREs.

   EREs Matching a Single Character
       The following EREs match a single character or a single collating  ele‐
       ment: An ordinary character is an ERE that matches itself.  An ordinary
       character is any character in the supported character set  except  new‐
       line  and  the  regular expression special characters listed in Special
       Characters below.  An ordinary character preceded  by  a	 backslash  is
       treated as the ordinary character itself.  Matching is based on the bit
       pattern used for encoding the character, not on the graphic representa‐
       tion of the character.  A regular expression special character preceded
       by a backslash is a regular expression that matches the special charac‐
       ter  itself.   When  not	 preceded by a backslash, such characters have
       special meaning in the specification of	EREs.	The  extended  regular
       expression special characters and the contexts in which they have their
       special meaning are:

	    The period, left square bracket, backslash, left parenthesis,
			     right parenthesis, asterisk, plus sign,  question
			     mark,  dollar  sign, and vertical bar are special
			     except when used in a bracket expression (see ERE
			     Bracket Expression).

	    The circumflex is special except when used
			     in	 a  bracket  expression in a non-leading posi‐
			     tion.

	    delimiter	     Any character used to bound  (i.e.,  delimit)  an
			     entire ERE is special for that ERE.
       A  period  when	used  outside  of a bracket expression, is an ERE that
       matches any printable or nonprintable character except newline.

   ERE Bracket Expression
       The syntax and rules for ERE bracket expressions are the same as for RE
       bracket expressions found above.

   EREs Matching Multiple Characters
       The  following  rules  may  be used to construct EREs matching multiple
       characters from EREs matching a single character:

	    EREERE	   A concatenation of EREs matches the	first  encoun‐
			   tered  concatenation of the strings matched by each
			   component of the ERE.  Such a concatenation of EREs
			   enclosed  in	 parentheses matches whatever the con‐
			   catenation without the  parentheses	matches.   For
			   example,  both the ERE and the ERE matches the sec‐
			   ond and third characters of the string The  longest
			   overall string is matched.

	    The special character plus
			   when	 following an ERE matching a single character,
			   or a concatenation of EREs enclosed in parenthesis,
			   is  an  ERE that matches one or more occurrences of
			   the	ERE  preceding	the  plus  sign.   The	string
			   matched  will contain as many occurrences as possi‐
			   ble.	 For  example,	the  ERE  matches  the	fourth
			   through seventh characters in the string

	    The special character asterisk
			   when	 following an ERE matching a single character,
			   or a concatenation of EREs enclosed in parenthesis,
			   is  an ERE that matches zero or more occurrences of
			   the ERE preceding the asterisk.  For	 example,  the
			   ERE	matches	 the  first character in the string If
			   there is any choice, the longest  left-most	string
			   that	 permits  a match is chosen.  For example, the
			   ERE matches the third through seventh characters in
			   the string

	    The special character question mark
			   when	 following an ERE matching a single character,
			   or a concatenation of EREs enclosed in parenthesis,
			   is  an  ERE that matches zero or one occurrences of
			   the ERE preceding the question  mark.   The	string
			   matched  will contain as many occurrences as possi‐
			   ble.	 For example, the ERE matches the second char‐
			   acter in the string

	    interval expression that functions the same way
			   as basic regular expression syntax,

   Alternation
       Two  EREs  separated  by	 the  special character vertical bar matches a
       string that is matched by either ERE.  For example, the ERE matches the
       string and the string A vertical bar '|' may not appear as follows:

	      may not appear first or last in an ERE.

	      may not appear immediately following a vertical bar.

	      may not appear after a left parenthesis.

	      may not appear immediately preceding a right parenthesis.

   Precedence
       The order of precedence is as follows, from high to low:

	    square brackets

	    asterisk, plus sign, question mark

	    anchoring

			   concatenation

	    alternation

       For  example,  the  ERE	is interpreted as "match either or It does not
       mean "match followed by or followed in turn by  (because	 concatenation
       has a higher order of precedence than alternation).

   Expression Anchoring
       An  ERE	can  be	 limited  to matching strings that begin or end a line
       (i.e., anchored) according to the following rules:

	    ·  A circumflex matches the	 beginning  of	a  line	 (anchors  the
	       expression  to  the beginning of a line).  For example, the ERE
	       matches the string in the line but not the same string  in  the
	       line

	    ·  A dollar sign matches the end of a line (anchors the expression
	       to the end of a line).  For example, the ERE matches the string
	       in the line but not the same string in the line

	    ·  An  ERE	anchored  by  both  and	 matches only strings that are
	       lines.  For example, the ERE matches only lines	consisting  of
	       the string Only empty lines match the ERE

       The use of duplication characters (+,*) following anchors is illegal.

PATTERN MATCHING NOTATION
       The  following rules apply to pattern matching notation except as noted
       in the descriptions of the specific utilities using pattern matching.

   Patterns Matching a Single Character
       The following patterns match a single character or a  single  collating
       element:	 An  ordinary  character is a pattern that matches itself.  An
       ordinary character is any character  in	the  supported	character  set
       except  newline	and  the pattern matching special characters listed in
       Special Characters below.  Matching is based on the  bit	 pattern  used
       for  encoding  the  character, not on the graphic representation of the
       character.  A pattern matching special character preceded  by  a	 back‐
       slash is a pattern that matches the special character itself.  When not
       preceded by a backslash, such characters have special  meaning  in  the
       specification of patterns.  The pattern matching special characters and
       the contexts in which they have their special meaning are:

	    The question mark, asterisk, and left square bracket  are  special
	    except when
			   used	 in  a bracket expression (see Pattern Bracket
			   Expression).
       A question mark when used outside of a bracket expression, is a pattern
       that matches any printable or nonprintable character except newline.

   Pattern Bracket Expression
       The  syntax  and	 rules for pattern bracket expressions are the same as
       for RE bracket expressions found above with the following exceptions:

	      The exclamation point character replaces the circumflex  charac‐
	      ter in its role in a non-matching list in the regular expression
	      notation.

	      The backslash is used as	an  escape  character  within  bracket
	      expressions.

   Patterns Matching Multiple Characters
       The following rules may be used to construct patterns matching multiple
       characters from patterns matching a single character:

	      The asterisk   is a pattern that matches any  string,  including
			     the null string.

	      RERE	     The  concatenation	 of patterns matching a single
			     character is a valid  pattern  that  matches  the
			     concatenation of the single characters or collat‐
			     ing elements matched by each of the  concatenated
			     patterns.	 For  example, the pattern matches the
			     string and

			     The concatenation of one or more patterns	match‐
			     ing a single character with one or more asterisks
			     is a  valid  pattern.   In	 such  patterns,  each
			     asterisk matches a string of zero or more charac‐
			     ters, up to the first character that matches  the
			     character following the asterisk in the pattern.

			     For  example, the pattern matches the strings and
			     but not the string When an asterisk is the	 first
			     or	 last  character in a pattern, it matches zero
			     or more characters that  precede  or  follow  the
			     characters	 matched  by the remainder of the pat‐
			     tern.   For  example,  the	 pattern  matches  the
			     strings and the pattern matches the strings and

   Rule Qualification for Patterns Used for Filename Expansion
       The  rules  described  above  for pattern matching are qualified by the
       following rules when the pattern matching notation is used for filename
       expansion by sh(1), csh(1), ksh(1), and make(1).

	      If  a  filename (including the component of a pathname that fol‐
	      lows the slash character) begins with a period the  period  must
	      be  explicitly  matched by using a period as the first character
	      of the pattern; it cannot be matched by either the asterisk spe‐
	      cial  character,	the  question  mark  special  character,  or a
	      bracket expression.  This rule does not apply to make(1).

	      The slash character in a pathname must be explicitly matched  by
	      using a slash in the pattern; it cannot be matched by either the
	      asterisk special character, the question mark special character,
	      or a bracket expression.	For make(1) only the part of the path‐
	      name following the last slash character can be matched by a spe‐
	      cial  character.	 That is, all special characters preceding the
	      last slash character lose their special meaning.

	      Specified patterns are matched against  existing	filenames  and
	      pathnames,  as appropriate.  If the pattern matches any existing
	      filenames or pathnames, the pattern is replaced with those file‐
	      names  and pathnames, sorted according to the collating sequence
	      in effect.  If the pattern does not match any existing filenames
	      or pathnames, the pattern string is left unchanged.

	      If  the  pattern begins with a tilde character, all of the ordi‐
	      nary characters preceding the first slash (or all characters  if
	      there is no slash) are treated as a possible login name.	If the
	      login name is null (i.e., the pattern contains only the tilde or
	      the  tilde  is  immediately  followed  by a slash), the tilde is
	      replaced by a pathname of the process's home directory, followed
	      by  a slash.  Otherwise, the combination of tilde and login name
	      are replaced by a pathname of the home directory associated with
	      the login name, followed by a slash.  If the system cannot iden‐
	      tify the login name, the result is implementation-defined.  This
	      rule does not apply to sh(1) or make(1).

	      If  the  pattern contains a character, variable substitution can
	      take place.  Environmental variables can be embedded within pat‐
	      terns as:

	      or:

	      Braces  are used to guarantee that characters following name are
	      not interpreted as belonging to name.   Substitution  occurs  in
	      the  order specified only once; that is, the resulting string is
	      not examined again for new names that occurred  because  of  the
	      substitution.

   Rule Qualification for Patterns Used in the case Command
       The  rules  described  above  for pattern matching are qualified by the
       following rule when the pattern matching notation is used in  the  case
       command of sh(1) and ksh(1).

	      Multiple	alternative  patterns in a single clause can be speci‐
	      fied by separating individual patterns  with  the	 vertical  bar
	      character	 strings  matching  any of the patterns separated this
	      way will cause the corresponding command list to be selected.

SEE ALSO
       ksh(1), sh(1), fnmatch(3C), glob(3C), regcomp(3C), setlocale(3C), envi‐
       ron(5).

STANDARDS CONFORMANCE
								     regexp(5)
[top]

List of man pages available for HP-UX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net