pcrepartial man page on MirBSD

Man page or keyword search:  
man Server   6113 pages
apropos Keyword Search (all sections)
Output format
MirBSD logo
[printable version]

PCREPARTIAL(3)							PCREPARTIAL(3)

NAME
       PCRE - Perl-compatible regular expressions

PARTIAL MATCHING IN PCRE

       In  normal  use	of  PCRE,  if  the  subject  string  that is passed to
       pcre_exec() or pcre_dfa_exec() matches as far as it goes,  but  is  too
       short  to  match	 the  entire  pattern, PCRE_ERROR_NOMATCH is returned.
       There are circumstances where it might be helpful to  distinguish  this
       case from other cases in which there is no match.

       Consider, for example, an application where a human is required to type
       in data for a field with specific formatting requirements.  An  example
       might be a date in the form ddmmmyy, defined by this pattern:

	 ^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$

       If the application sees the user's keystrokes one by one, and can check
       that what has been typed so far is potentially valid,  it  is  able  to
       raise  an  error as soon as a mistake is made, possibly beeping and not
       reflecting the character that has been typed. This  immediate  feedback
       is  likely  to  be a better user interface than a check that is delayed
       until the entire string has been entered.

       PCRE supports the concept of partial matching by means of the PCRE_PAR‐
       TIAL   option,	which	can   be   set	when  calling  pcre_exec()  or
       pcre_dfa_exec(). When this flag is set for pcre_exec(), the return code
       PCRE_ERROR_NOMATCH  is converted into PCRE_ERROR_PARTIAL if at any time
       during the matching process the last part of the subject string matched
       part  of	 the  pattern. Unfortunately, for non-anchored matching, it is
       not possible to obtain the position of the start of the partial	match.
       No captured data is set when PCRE_ERROR_PARTIAL is returned.

       When   PCRE_PARTIAL   is	 set  for  pcre_dfa_exec(),  the  return  code
       PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the  end  of
       the  subject is reached, there have been no complete matches, but there
       is still at least one matching possibility. The portion of  the	string
       that provided the partial match is set as the first matching string.

       Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers
       the last literal byte in a pattern, and abandons	 matching  immediately
       if  such a byte is not present in the subject string. This optimization
       cannot be used for a subject string that might match only partially.

RESTRICTED PATTERNS FOR PCRE_PARTIAL

       Because of the way certain internal optimizations  are  implemented  in
       the  pcre_exec()	 function, the PCRE_PARTIAL option cannot be used with
       all patterns. These restrictions do not apply when  pcre_dfa_exec()  is
       used.  For pcre_exec(), repeated single characters such as

	 a{2,4}

       and repeated single metasequences such as

	 \d+

       are  not permitted if the maximum number of occurrences is greater than
       one.  Optional items such as \d? (where the maximum is one) are permit‐
       ted.   Quantifiers  with any values are permitted after parentheses, so
       the invalid examples above can be coded thus:

	 (a){2,4}
	 (\d)+

       These constructions run more slowly, but for the kinds  of  application
       that  are  envisaged  for this facility, this is not felt to be a major
       restriction.

       If PCRE_PARTIAL is set for a pattern  that  does	 not  conform  to  the
       restrictions,  pcre_exec() returns the error code PCRE_ERROR_BADPARTIAL
       (-13).  You can use the PCRE_INFO_OKPARTIAL call to pcre_fullinfo()  to
       find out if a compiled pattern can be used for partial matching.

EXAMPLE OF PARTIAL MATCHING USING PCRETEST

       If  the	escape	sequence  \P  is  present in a pcretest data line, the
       PCRE_PARTIAL flag is used for the match. Here is a run of pcretest that
       uses the date example quoted above:

	   re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
	 data> 25jun04\P
	  0: 25jun04
	  1: jun
	 data> 25dec3\P
	 Partial match
	 data> 3ju\P
	 Partial match
	 data> 3juj\P
	 No match
	 data> j\P
	 No match

       The  first  data	 string	 is  matched completely, so pcretest shows the
       matched substrings. The remaining four strings do not  match  the  com‐
       plete  pattern,	but  the first two are partial matches. The same test,
       using pcre_dfa_exec() matching (by means of the	\D  escape  sequence),
       produces the following output:

	   re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
	 data> 25jun04\P\D
	  0: 25jun04
	 data> 23dec3\P\D
	 Partial match: 23dec3
	 data> 3ju\P\D
	 Partial match: 3ju
	 data> 3juj\P\D
	 No match
	 data> j\P\D
	 No match

       Notice  that in this case the portion of the string that was matched is
       made available.

MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()

       When a partial match has been found using pcre_dfa_exec(), it is possi‐
       ble  to	continue  the  match  by providing additional subject data and
       calling pcre_dfa_exec() again with the same  compiled  regular  expres‐
       sion, this time setting the PCRE_DFA_RESTART option. You must also pass
       the same working space as before, because this is where details of  the
       previous	 partial  match are stored. Here is an example using pcretest,
       using the \R escape sequence to set the PCRE_DFA_RESTART option (\P and
       \D are as above):

	   re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
	 data> 23ja\P\D
	 Partial match: 23ja
	 data> n05\R\D
	  0: n05

       The  first  call has "23ja" as the subject, and requests partial match‐
       ing; the second call  has  "n05"	 as  the  subject  for	the  continued
       (restarted)  match.   Notice  that when the match is complete, only the
       last part is shown; PCRE does  not  retain  the	previously  partially-
       matched	string. It is up to the calling program to do that if it needs
       to.

       You can set PCRE_PARTIAL	 with  PCRE_DFA_RESTART	 to  continue  partial
       matching over multiple segments. This facility can be used to pass very
       long subject strings to pcre_dfa_exec(). However, some care  is	needed
       for certain types of pattern.

       1.  If  the  pattern contains tests for the beginning or end of a line,
       you need to pass the PCRE_NOTBOL or PCRE_NOTEOL options,	 as  appropri‐
       ate,  when  the subject string for any call does not contain the begin‐
       ning or end of a line.

       2. If the pattern contains backward assertions (including  \b  or  \B),
       you  need  to  arrange for some overlap in the subject strings to allow
       for this. For example, you could pass the subject in  chunks  that  are
       500  bytes long, but in a buffer of 700 bytes, with the starting offset
       set to 200 and the previous 200 bytes at the start of the buffer.

       3. Matching a subject string that is split into multiple segments  does
       not  always produce exactly the same result as matching over one single
       long string.  The difference arises when there  are  multiple  matching
       possibilities,  because a partial match result is given only when there
       are no completed matches in a call to pcre_dfa_exec(). This means  that
       as  soon	 as  the  shortest match has been found, continuation to a new
       subject segment is no longer possible.  Consider this pcretest example:

	   re> /dog(sbody)?/
	 data> do\P\D
	 Partial match: do
	 data> gsb\R\P\D
	  0: g
	 data> dogsbody\D
	  0: dogsbody
	  1: dog

       The pattern matches the words "dog" or "dogsbody". When the subject  is
       presented  in  several  parts  ("do" and "gsb" being the first two) the
       match stops when "dog" has been found, and it is not possible  to  con‐
       tinue.  On  the	other  hand,  if  "dogsbody"  is presented as a single
       string, both matches are found.

       Because of this phenomenon, it does not usually make  sense  to	end  a
       pattern that is going to be matched in this way with a variable repeat.

       4. Patterns that contain alternatives at the top level which do not all
       start with the same pattern item may not work as expected. For example,
       consider this pattern:

	 1234|3789

       If  the	first  part of the subject is "ABC123", a partial match of the
       first alternative is found at offset 3. There is no partial  match  for
       the second alternative, because such a match does not start at the same
       point in the subject string. Attempting to  continue  with  the	string
       "789" does not yield a match because only those alternatives that match
       at one point in the subject are remembered. The problem arises  because
       the  start  of the second alternative matches within the first alterna‐
       tive. There is no problem with anchored patterns or patterns such as:

	 1234|ABCD

       where no string can be a partial match for both alternatives.

AUTHOR

       Philip Hazel
       University Computing Service
       Cambridge CB2 3QH, England.

REVISION

       Last updated: 04 June 2007
       Copyright (c) 1997-2007 University of Cambridge.

								PCREPARTIAL(3)
[top]

List of man pages available for MirBSD

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net