peg man page on Ubuntu

Man page or keyword search:  
man Server   6591 pages
apropos Keyword Search (all sections)
Output format
Ubuntu logo
[printable version]

grammar::peg(3tcl)	 Grammar operations and usage	    grammar::peg(3tcl)

______________________________________________________________________________

NAME
       grammar::peg - Create and manipulate parsing expression grammars

SYNOPSIS
       package require Tcl  8.4

       package require snit

       package require grammar::peg  ?0.1?

       ::grammar::peg pegName ?=|:=|<--|as|deserialize src?

       pegName destroy

       pegName clear

       pegName = srcPEG

       pegName --> dstPEG

       pegName serialize

       pegName deserialize serialization

       pegName is valid

       pegName start ?pe?

       pegName nonterminals

       pegName nonterminal add nt pe

       pegName nonterminal delete nt1 ?nt2 ...?

       pegName nonterminal exists nt

       pegName nonterminal rename nt ntnew

       pegName nonterminal mode nt ?mode?

       pegName nonterminal rule nt

       pegName unknown nonterminals

_________________________________________________________________

DESCRIPTION
       This package provides a container class for parsing expression grammars
       (Short: PEG).  It allows the incremental definition of the grammar, its
       manipulation  and querying of the definition.  The package neither pro‐
       vides complex operations on the grammar, nor has it the ability to exe‐
       cute  a	grammar	 definition  for  a  stream  of symbols.  Two packages
       related to this one are grammar::mengine and grammar::peg::interpreter.
       The first of them defines a general virtual machine for the matching of
       a character stream, and the second implements an interpreter for	 pars‐
       ing expression grammars on top of that virtual machine.

   TERMS & CONCEPTS
       PEGs  are similar to context-free grammars, but not equivalent; in some
       cases PEGs are strictly more powerful than context-free grammars (there
       exist PEGs for some non-context-free languages).	 The formal mathemati‐
       cal definition of parsing expressions and parsing  expression  grammars
       can be found in section PARSING EXPRESSION GRAMMARS.

       In  short,  we have terminal symbols, which are the most basic building
       blocks for sentences, and nonterminal symbols with  associated  parsing
       expressions,  defining  the grammatical structure of the sentences. The
       two sets of symbols are distinctive, and do not overlap. When  speaking
       about  symbols  the  word  "symbol" is often left out. The union of the
       sets of terminal and nonterminal symbols is called the set of symbols.

       Here the set of terminal symbols is not explicitly managed, but implic‐
       itly defined as the set of all characters. Note that this means that we
       inherit from Tcl the ability to handle all of Unicode.

       A pair of nonterminal and parsing expression is also called a grammati‐
       cal  rule,  or rule for short. In the context of a rule the nonterminal
       is often called the left-hand-side (LHS), and  the  parsing  expression
       the right-hand-side (RHS).

       The  start  expression  of a grammar is a parsing expression from which
       all the sentences contained in the language specified  by  the  grammar
       are  derived.   To  make	 the  understanding of this term easier let us
       assume for a moment that the RHS of each rule, and  the	start  expres‐
       sion, is either a sequence of symbols, or a series of alternate parsing
       expressions.  In the latter case the rule can  be  seen	as  a  set  of
       rules,  each  providing one alternative for the nonterminal.  A parsing
       expression A' is now a derivation of a parsing expression A if we  pick
       one of the nonterminals N in the expression, and one of the alternative
       rules R for N, and then replace the nonterminal in A with  the  RHS  of
       the  chosen  rule.  Here we can see why the terminal symbols are called
       such. They cannot be expanded any further, thus terminate  the  process
       of deriving new expressions.  An example

	   Rules
	     (1)  A <- a B c
	     (2a) B <- d B
	     (2b) B <- e

	   Some derivations, using starting expression A.

	     A -/1/-> a B c -/2a/-> a d B c -/2b/-> a d e c

       A  derived  expression  containing only terminal symbols is a sentence.
       The set of all sentences which can be derived from the start expression
       is the language of the grammar.

       Some definitions for nonterminals and expressions:

       [1]    A	 nonterminal A is called reachable if it is possible to derive
	      a parsing expression from the start expression which contains A.

       [2]    A nonterminal A is called useful if it is possible to  derive  a
	      sentence from it.

       [3]    A	 nonterminal A is called recursive if it is possible to derive
	      a parsing expression from it which contains A, again.

       [4]    The FIRST set of a nonterminal A contains all the symbols	 which
	      can  occur  of  as  the  leftmost symbol in a parsing expression
	      derived from A. If the FIRST set contains	 A  itself  then  that
	      nonterminal is called left-recursive.

       [5]    The  LAST	 set of a nonterminal A contains all the symbols which
	      can occur of as the rightmost symbol  in	a  parsing  expression
	      derived from A. If the LAST set contains A itself then that non‐
	      terminal is called right-recursive.

       [6]    The FOLLOW set of a nonterminal A contains all the symbols which
	      can occur after A in a parsing expression derived from the start
	      expression.

       [7]    A nonterminal (or parsing expression) is called nullable if  the
	      empty sentence can be derived from it.

       And based on the above definitions for grammars:

       [1]    A	 grammar G is recursive if and only if it contains a nontermi‐
	      nal A which is recursive. The terms left-	 and  right-recursive,
	      and useful are analogously defined.

       [2]    A	 grammar  is  minimal if it contains only reachable and useful
	      nonterminals.

       [3]    A grammar is wellformed if it is not left-recursive. Such	 gram‐
	      mars  are also complete, which means that they always succeed or
	      fail on all input sentences. For an incomplete  grammar  on  the
	      other  hand  input sentences exist for which an attempt to match
	      them against the grammar will not terminate.

       [4]    As we wish to allow ourselves to build a	grammar	 incrementally
	      in  a container object we will encounter stages where the RHS of
	      one or more rules reference symbols which are not yet  known  to
	      the  container.  Such  a grammar we call invalid.	 We cannot use
	      the term incomplete as this term is already taken, see the  last
	      item.

   CONTAINER CLASS API
       The package exports the API described here.

       ::grammar::peg pegName ?=|:=|<--|as|deserialize src?
	      The command creates a new container object for a parsing expres‐
	      sion grammar and returns the fully qualified name of the	object
	      command as its result. The API the returned command is following
	      is described in the section CONTAINER OBJECT API. It may be used
	      to  invoke  various  operations on the container and the grammar
	      within.

	      The new container, i.e. grammar will be empty if no src is spec‐
	      ified. Otherwise it will contain a copy of the grammar contained
	      in the src.  The src has to be a container object reference  for
	      all  operators  except  deserialize.   The  deserialize operator
	      requires src to be the serialization  of	a  parsing  expression
	      grammar instead.

	      An  empty	 grammar  has  no  nonterminal	symbols, and the start
	      expression is the empty expression, i.e. epsilon. It  is	valid,
	      but not useful.

   CONTAINER OBJECT API
       All  grammar  container	objects	 provide the following methods for the
       manipulation of their contents:

       pegName destroy
	      Destroys the grammar, including its storage space and associated
	      command.

       pegName clear
	      Clears  out  the definition of the grammar contained in pegName,
	      but does not destroy the object.

       pegName = srcPEG
	      Assigns the contents of the grammar contained in srcPEG to  peg‐
	      Name,  overwriting any existing definition.  This is the assign‐
	      ment operator for grammars. It copies the grammar	 contained  in
	      the  grammar  object  srcPEG over the grammar definition in peg‐
	      Name. The old contents of pegName are deleted by this operation.

	      This operation is in effect equivalent to

		  pegName deserialize [srcPEG serialize]

       pegName --> dstPEG
	      This is the reverse assignment operator for grammars. It	copies
	      the  automation contained in the object pegName over the grammar
	      definition in the object dstPEG.	The old contents of dstPEG are
	      deleted by this operation.

	      This operation is in effect equivalent to

		  dstPEG deserialize [pegName serialize]

       pegName serialize
	      This  method  serializes the grammar stored in pegName. In other
	      words it returns a tcl value completely describing that grammar.
	      This  allows,  for  example, the transfer of grammars over arbi‐
	      trary channels, persistence, etc.	 This method is also the basis
	      for both the copy constructor and the assignment operator.

	      The  result of this method has to be semantically identical over
	      all implementations of the grammar::peg interface. This is  what
	      will  enable  us	to copy grammars between different implementa‐
	      tions of the same interface.

	      The result is a list of four elements with the following	struc‐
	      ture:

	      [1]    The constant string grammar::peg.

	      [2]    A dictionary. Its keys are the names of all known nonter‐
		     minal symbols, and their associated values are the	 pars‐
		     ing expressions describing their sentennial structure.

	      [3]    A dictionary. Its keys are the names of all known nonter‐
		     minal symbols, and their associated  values  hints	 to  a
		     matcher  regarding	 the  semantic	values produced by the
		     symbol.

	      [4]    The last item is a parsing expression, the start  expres‐
		     sion of the grammar.

       Assuming the following PEG for simple mathematical expressions

	   Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'
	   Sign	      <- '+' / '-'
	   Number     <- Sign? Digit+
	   Expression <- '(' Expression ')' / (Factor (MulOp Factor)*)
	   MulOp      <- '*' / '/'
	   Factor     <- Term (AddOp Term)*
	   AddOp      <- '+'/'-'
	   Term	      <- Number

       a possible serialization is

	   grammar::peg \\
	   {Expression {/ {x ( Expression )} {x Factor {* {x MulOp Factor}}}} \\
	    Factor     {x Term {* {x AddOp Term}}} \\
	    Term       Number \\
	    MulOp      {/ * /} \\
	    AddOp      {/ + -} \\
	    Number     {x {? Sign} {+ Digit}} \\
	    Sign       {/ + -} \\
	    Digit      {/ 0 1 2 3 4 5 6 7 8 9} \\
	   } \\
	   {Expression value	 Factor	    value \\
	    Term       value	 MulOp	    value \\
	    AddOp      value	 Number	    value \\
	    Sign       value	 Digit	    value \\
	   }
	   Expression

       A possible one, because the order of the nonterminals in the dictionary
       is not relevant.

       pegName deserialize serialization
	      This is the complement to serialize.  It	replaces  the  grammar
	      definition  in pegName with the grammar described by the serial‐
	      ization value. The old contents of pegName are deleted  by  this
	      operation.

       pegName is valid
	      A	 predicate. It tests whether the PEG in pegName is valid.  See
	      section TERMS & CONCEPTS for  the	 definition  of	 this  grammar
	      property.	 The result is a boolean value. It will be set to true
	      if the PEG has the tested property, and false otherwise.

       pegName start ?pe?
	      This method defines the start  expression	 of  the  grammar.  It
	      replaces	the previously defined start expression with the pars‐
	      ing expression pe.  The method fails and throws an error	if  pe
	      does  not contain a valid parsing expression as specified in the
	      section PARSING EXPRESSIONS. In that  case  the  existing	 start
	      expression  is not changed.  The method returns the empty string
	      as its result.

	      If the method is called without an argument it will  return  the
	      currently defined start expression.

       pegName nonterminals
	      Returns the set of all nonterminal symbols known to the grammar.

       pegName nonterminal add nt pe
	      This  method  adds the nonterminal nt and its associated parsing
	      expression pe to the set of nonterminal symbols and rules of the
	      PEG  contained  in  the  object  pegName.	  The method fails and
	      throws an error if either the string nt is already  known	 as  a
	      symbol of the grammar, or if pe does not contain a valid parsing
	      expression as specified in the section PARSING  EXPRESSIONS.  In
	      that  case  the  current set of nonterminal symbols and rules is
	      not changed.  The method returns the empty string as its result.

       pegName nonterminal delete nt1 ?nt2 ...?
	      This method removes the named symbols nt1, nt2 from the  set  of
	      nonterminal  symbols of the PEG contained in the object pegName.
	      The method fails and throws an error if any of  the  strings  is
	      not  known as a nonterminal symbol. In that case the current set
	      of nonterminal symbols is not changed.  The method  returns  the
	      empty string as its result.

	      The  stored  grammar becomes invalid if the deleted nonterminals
	      are referenced by the RHS of still-known rules.

       pegName nonterminal exists nt
	      A predicate. It tests whether the nonterminal symbol nt is known
	      to  the  PEG in pegName.	The result is a boolean value. It will
	      be set to true if the symbol nt is known, and false otherwise.

       pegName nonterminal rename nt ntnew
	      This method renames the nonterminal symbol  nt  to  ntnew.   The
	      method  fails and throws an error if either nt is not known as a
	      nonterminal, or if ntnew is a known symbol.  The method  returns
	      the empty string as its result.

       pegName nonterminal mode nt ?mode?
	      This  mode returns or sets the semantic mode associated with the
	      nonterminal symbol nt. If no mode is specified the current  mode
	      of  the  nonterminal  is returned. Otherwise the current mode is
	      set to mode.  The method fails and throws an error if nt is  not
	      known  as a nonterminal.	The grammar interpreter implemented by
	      the package grammar::peg::interpreter recognizes	the  following
	      modes:

	      value  The  semantic  value  of  the nonterminal is the abstract
		     syntax tree created from the AST's of the RHS and a  node
		     for the nonterminal itself.

	      match  The  semantic value of the nonterminal is an the abstract
		     syntax tree consisting of single a node  for  the	string
		     matched  by  the  RHS.  The ASTs generated by the RHS are
		     discarded.

	      leaf   The semantic value of the nonterminal is an the  abstract
		     syntax tree consisting of single a node for the nontermi‐
		     nal itself. The ASTs generated by the RHS are discarded.

	      discard
		     The nonterminal has no semantic value. The ASTs generated
		     by the RHS are discarded (as well).

       pegName nonterminal rule nt
	      This  method  returns the parsing expression associated with the
	      nonterminal nt.  The method fails and throws an error if	nt  is
	      not known as a nonterminal.

       pegName unknown nonterminals
	      This method returns a list containing the names of all nontermi‐
	      nal symbols which are referenced on the  RHS  of	a  grammatical
	      rule,  but  have	no  rule  definining their structure. In other
	      words, a list of the nonterminal symbols which make the  grammar
	      invalid. The grammar is valid if this list is empty.

   PARSING EXPRESSIONS
       Various methods of PEG container objects expect a parsing expression as
       their argument, or will return such. This section specifies the	format
       such parsing expressions are in.

       [1]    The  string  epsilon is an atomic parsing expression. It matches
	      the empty string.

       [2]    The string alnum is an atomic parsing expression. It matches any
	      alphanumeric character.

       [3]    The string alpha is an atomic parsing expression. It matches any
	      alphabetical character.

       [4]    The string dot is an atomic parsing expression. It  matches  any
	      character.

       [5]    The  expression  [list  t x] is an atomic parsing expression. It
	      matches the terminal string x.

       [6]    The expression [list n A] is an atomic  parsing  expression.  It
	      matches the nonterminal A.

       [7]    For  parsing expressions e1, e2, ... the result of [list / e1 e2
	      ... ] is a parsing expression as	well.	This  is  the  ordered
	      choice, aka prioritized choice.

       [8]    For  parsing expressions e1, e2, ... the result of [list x e1 e2
	      ... ] is a parsing expression as well.  This is the sequence.

       [9]    For a parsing expression e the result of [list * e] is a parsing
	      expression as well.  This is the kleene closure, describing zero
	      or more repetitions.

       [10]   For a parsing expression e the result of [list + e] is a parsing
	      expression  as  well.   This  is	the  positive  kleene closure,
	      describing one or more repetitions.

       [11]   For a parsing expression e the result of [list & e] is a parsing
	      expression as well.  This is the and lookahead predicate.

       [12]   For a parsing expression e the result of [list ! e] is a parsing
	      expression as well.  This is the not lookahead predicate.

       [13]   For a parsing expression e the result of [list ? e] is a parsing
	      expression as well.  This is the optional input.

       Examples of parsing expressions where already shown, in the description
       of the method serialize.

PARSING EXPRESSION GRAMMARS
       For the mathematically inclined, a PEG is a 4-tuple (VN,VT,R,eS) where

       ·      VN is a set of nonterminal symbols,

       ·      VT is a set of terminal symbols,

       ·      R is a finite set of rules, where each rule is a pair  (A,e),  A
	      in VN, and e a parsing expression.

       ·      eS is a parsing expression, the start expression.

       Further constraints are

       ·      The intersection of VN and VT is empty.

       ·      For  all	A  in  VT exists exactly one pair (A,e) in R. In other
	      words, R is a  function  from  nonterminal  symbols  to  parsing
	      expressions.

       Parsing expression are inductively defined via

       ·      The empty string (epsilon) is a parsing expression.

       ·      A terminal symbol a is a parsing expression.

       ·      A nonterminal symbol A is a parsing expression.

       ·      e1e2  is	a parsing expression for parsing expressions e1 and 2.
	      This is called sequence.

       ·      e1/e2 is a parsing expression for parsing expressions e1 and  2.
	      This is called ordered choice.

       ·      e*  is  a	 parsing  expression for parsing expression e. This is
	      called zero-or-more repetitions, also known as kleene closure.

       ·      e+ is a parsing expression for parsing  expression  e.  This  is
	      called  one-or-more  repetitions,	 also known as positive kleene
	      closure.

       ·      !e is a parsing expression for parsing expression	 e1.  This  is
	      called a not lookahead predicate.

       ·      &e  is  a	 parsing expression for parsing expression e1. This is
	      called an and lookahead predicate.

       PEGs are used to define a grammatical structure for streams of  symbols
       over  VT.  They	are  a modern phrasing of older formalisms invented by
       Alexander Birham. These formalisms  were	 called	 TS  (TMG  recognition
       scheme),	 and  gTS  (generalized	 TS).  Later they were renamed to TPDL
       (Top-Down Parsing Languages) and gTPDL (generalized TPDL).

       They can be easily implemented by recursive descent parsers with	 back‐
       tracking. This makes them relatives of LL(k) Context-Free Grammars.

REFERENCES
       [1]    The   Packrat  Parsing  and  Parsing  Expression	Grammars  Page
	      [http://www.pdos.lcs.mit.edu/~baford/packrat/], by  Bryan	 Ford,
	      Massachusetts  Institute	of  Technology. This is the main entry
	      page to PEGs, and their realization through Packrat Parsers.

       [2]    Parsing	   Techniques	   -	  A	 Practical	 Guide
	      [http://www.cs.vu.nl/~dick/PTAPG.html],  an online book offering
	      a clear, accessible, and thorough discussion of  many  different
	      parsing  techniques  with	 their interrelations and applicabili‐
	      ties, including error recovery techniques.

       [3]    Compilers and Compiler  Generators  [http://scifac.ru.ac.za/com‐
	      pilers/], an online book using CoCo/R, a generator for recursive
	      descent parsers.

BUGS, IDEAS, FEEDBACK
       This document, and the package it describes, will  undoubtedly  contain
       bugs  and  other	 problems.   Please  report such in the category gram‐
       mar_peg	  of	the	Tcllib	   SF	  Trackers     [http://source‐
       forge.net/tracker/?group_id=12883].   Please  also report any ideas for
       enhancements you may have for either package and/or documentation.

KEYWORDS
       LL(k), TDPL,  context-free  languages,  expression,  grammar,  parsing,
       parsing	expression,  parsing  expression grammar, push down automaton,
       recursive descent, state, top-down parsing languages, transducer

CATEGORY
       Grammars and finite automata

COPYRIGHT
       Copyright (c) 2005 Andreas Kupries <andreas_kupries@users.sourceforge.net>

grammar_peg			      0.1		    grammar::peg(3tcl)
[top]

List of man pages available for Ubuntu

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net