yacc(1)yacc(1)NAMEyacc - Generates an LR(1) parsing program from input consisting of a
context-free grammar specification
SYNOPSISyacc [-vltds] [-b prefix] [-N number] [-p symbol_prefix] [-P pathname]
grammar
STANDARDS
Interfaces documented on this reference page conform to industry stan‐
dards as follows:
yacc: XPG4, XPG4-UNIX
Refer to the standards(5) reference page for more information about
industry standards and associated tags.
OPTIONS
Uses prefix instead of y as the prefix for all output filenames (pre‐
fix.tab.c, prefix.tab.h, and prefix.output). Produces the <y.tab.h>
file, which contains the #define statements that associate the yacc-
assigned token codes with your token names. This allows source files
other than y.tab.c to access the token codes by including this header
file. Includes no #line constructs in y.tab.c. Use this only after the
grammar and associated actions are fully debugged. [Tru64 UNIX] Pro‐
vides yacc with extra storage for building its LALR tables, which may
be necessary when compiling very large grammars. The number should be
larger than 40,000 when you use this option. Allows multiple yacc
parsers to be linked together. Use symbol_prefix instead of yy to pre‐
fix global symbols. [Tru64 UNIX] Specifies an alternative parser
(instead of /usr/ccs/lib/yaccpar). The pathname specifies the filename
of the skeleton to be used in place of yaccpar). [Tru64 UNIX] Breaks
the yyparse() function into several smaller functions. Because its size
is somewhat proportional to that of the grammar, it is possible for
yyparse() to become too large to compile, optimize, or execute effi‐
ciently. Compiles run-time debugging code. By default, this code is
not included when y.tab.c is compiled. If YYDEBUG has a nonzero value,
the C compiler (cc) includes the debugging code, whether or not the -t
option was used. Without compiling this code, yyparse() will run more
quickly. Produces the y.output file, which contains a readable
description of the parsing tables and a report on conflicts generated
by grammar ambiguities.
OPERANDS
The pathname of a file containing input instructions. The format of
this file is described in the DESCRIPTION section.
DESCRIPTION
The yacc command converts a context-free grammar specification into a
set of tables for a simple automaton that executes an LR(1) parsing
algorithm. The yacc grammar can be ambiguous; specified precedence
rules are used to break ambiguities.
You must compile the y.tab.c output file with a C language compiler to
produce the yyparse() function. This function must be loaded with a
yylex lexical analyzer function, as well as two routines that you must
provide, main() and an error-handling routine, yyerror(). The lex com‐
mand is useful for creating lexical analyzers usable by yacc.
The yacc program reads its skeleton parser from the file
/usr/ccs/lib/yaccpar. Use the environment variable YACCPAR to specify
another location for the yacc program to read from. If you use this
environment variable, the -P option is ignored, if specified.
The general format of the yacc input file is as follows:
[definitions] %% rules [%% [user subroutines]]
where Is the section where you define the variables to be used later in
the grammar, such as in the rules section. It is also where files are
included (#include) and processing conditions are defined. This sec‐
tion is optional. Is the section that contains grammar rules for the
parser. A yacc input file must have a rules section. Is the section
that contains user-supplied subroutines that can be used by the actions
in the rules section. This section is optional.
Comments, in C syntax, can appear anywhere in the user subroutines sec‐
tion or the definitions section. In the rules section, comments can
appear wherever a symbol is allowed. Blank lines or lines consisting of
white space can be inserted anywhere in the file, and are ignored. The
NULL character must not be used in grammar rules or literals.
Definitions Section of Input File
The definitions section of a yacc input file contains entries that per‐
form the following functions: Includes standard I/O header file.
Defines global variables. Defines the list rule as the place to start
processing. Defines the tokens used by the parser. Defines the opera‐
tors and their precedence.
Each line in the definitions section can be: When placed on lines by
themselves, these enclose C code to be passed into the global defini‐
tions of the output file. Such lines commonly include preprocessor
directives and declarations of external variables and functions. Lists
tokens or terminal symbols to be used in the rest of the input file.
This line is needed for tokens that do not appear in other % defini‐
tions. If type is present, the C type for all tokens on this line is
declared to be the type referenced by type. If a positive integer num‐
ber follows a token, that value is assigned to the token. Indicates
that each token is an operator, all tokens in this definition have
equal precedence, and a succession of the operators listed in this def‐
inition are evaluated left to right. Indicates that each token is an
operator, that all tokens in this definition have equal precedence, and
that a succession of the operators listed in this definition are evalu‐
ated right to left. Indicates that each token is an operator, and that
the operators listed in this definition cannot appear in succession.
Indicates that the token cannot be used associatively. Indicates the
highest-level production rule to be reduced; in other words, the rule
where the parser can consider its work done and can terminate process‐
ing. If this definition is not included, the parser uses the first pro‐
duction rule. The symbol must be non-terminal (not a token). Defines
each symbol as data type type, to resolve ambiguities. If this con‐
struct is present, yacc performs type checking and otherwise assumes
all symbols to be of type integer. Defines the yylval global variable
as a union, where union-def is a standard C definition in the format: {
type member ; [type member ; ...] }
At least one member should be an int. Any valid C data type can
be defined, including structures. When you run yacc with the -d
option, the definition of yylval is placed in the <y.tab.h> file
and can be referred to in a lex input file.
Every token (non-terminal symbol) must be listed in one of the preced‐
ing % definitions. Multiple tokens can be separated by white space or
commas. All the tokens in %left, %right, and %nonassoc definitions are
assigned a precedence with tokens in later definitions having prece‐
dence over those in earlier definitions.
In addition to symbols, a token can be literal character enclosed in
single quotes. (Multibyte characters are recognized by the lexical ana‐
lyzer and returned as tokens.) The following special characters can be
used, just as in C programs: Alert Newline Tab Vertical tab Carriage
Return Backspace Form Feed Backslash Single Quote Question mark One or
more octal digits specifying the integer value of the character
Rules Section of Input File
The rules section of a yacc input file defines the rules that parse the
input stream. It consists of a series of production rules that the
parser tries to reduce. The format of each production rule is:
symbol : symbol-sequence [action] [| symbol-sequence [action] ...] ;
A symbol-sequence consists of zero or more symbols separated by white
space. The first symbol must be the first character of the line, but
newlines and other white space can appear anywhere else in the rule.
All terminal symbols must be declared in %token definitions.
Each symbol-sequence represents an alternative way of reducing the
rule. A symbol can appear recursively in its own rule. Always use
left-recursion (where the recursive symbol appears before the terminat‐
ing case in symbol-sequence).
The following sequence indicates that the current sequence of symbols
is to be preferred over others, at the level of precedence assigned to
token in the definitions section of the input file:
%prec token
The specially defined token error matches any unrecognized sequence of
input. This token causes the parser to invoke the yyerror function. By
default, the parser tries to synchronize with the input and continue
processing it by reading and discarding all input up to the symbol fol‐
lowing error. (You can override this behavior through the yyerrok
action.) If no error token appears in the yacc input file, the parser
exits with an error message upon encountering unrecognized input.
The parser always executes action after encountering the symbol that
precedes it. Thus, an action can appear in the middle of a symbol-
sequence, after each symbol-sequence, or after multiple instances of
symbol-sequence. In the last case, action is executed when the parser
matches any of the sequences.
The action consists of standard C code within braces and can also take
the following values, variables, and keywords. If the token returned
by the yylex function is associated with a significant value, yylex
should place the value in this global variable. By default, yylval is
of type long. The definitions section can include a %union definition
to associate with other data types, including structures. If you run
yacc with the -d option, the full yylval definition is passed into the
<y.tab.h> file for access by lex. Causes the parser to start parsing
tokens immediately after an erroneous sequence, instead of performing
the default action of reading and discarding tokens up to a synchro‐
nization token. The yyerrok action should appear immediately after the
error token. Refers to symbol n, a token index in the production,
counting from the beginning of the production rule, where the first
symbol after the colon is $1. The type variable is the name of one of
the union lines listed in the %union directive in the declaration sec‐
tion. The <type> syntax (non-standard) allows the value to be cast to a
specific data type. Note that you will rarely need to use the type syn‐
tax. Refers to the value returned by the matched symbol-sequence and
used for the matched symbol when reducing other rules. The symbol-
sequence generally assigns a value to $$. The type variable is the name
of one of the union lines listed in the %union directive in the decla‐
ration section. The <type> syntax (non-standard) allows the value to be
cast to a specific data type. Note that you will rarely need to use the
type syntax.
User Subroutines Section of Input File
The user subroutines section of the yacc input file contains user-sup‐
plied functions. Because these functions are included in this file, you
do not need to use the yacc library when processing this file. If you
supply a lexical analyzer (yylex) to the parser, it must be contained
in the user subroutines section.
The following functions, which are contained in the user subroutines
section, are invoked within the yyparse function generated by yacc.
The lexical analyzer called by yyparse to recognize each token of
input. Usually this function is created by lex. yylex reads input,
recognizes expressions within the input, and returns a token number
representing the kind of token read. The function returns an int value.
A return value of 0 (zero) means the end of input.
If the parser and yylex do not agree on these token numbers,
reliable communication between them cannot occur. For one-char‐
acter literals, the token is simply the numeric value of the
character in the current character set. The numbers for other
tokens can be chosen by either yacc or the user. In either case,
the #define construct of C is used to allow yylex() to return
these numbers symbolically. The #define statements are put into
the code file, and into the header file if that file is
requested. The set of characters permitted by yacc in an identi‐
fier is larger than that permitted by C. Token names found to
contain such characters will not be included in the #define dec‐
larations.
If the token numbers are chosen by yacc, those tokens other than
literals are assigned numbers greater than 256, although no
order is implied. A token can be explicitly assigned a number by
following its first appearance in the declaration section with a
number. Names and literals not defined in this way retain their
default definition. All assigned token numbers are unique and
distinct from the token numbers used for literals. If duplicate
token numbers cause conflicts in parser generation, yacc reports
an error; otherwise, it is unspecified whether the token assign‐
ment is accepted or an error is reported.
The end of the input is marked by a special token called the
endmarker that has a token number that is zero or negative. All
lexical analyzers return zero or negative as a token number upon
reaching the end of their input. If the tokens up to, but not
excluding, the endmarker form a structure that matches the start
symbol, the parser accepts the input. If the endmarker is seen
in any other context, it is considered an error. The function
that the parser calls upon encountering an input error. The
default function, defined in liby.a, simply prints string to the
standard error. The user can redefine the function. The func‐
tion's type is void. The wrap-up routine that returns a value
of 1 when the end of input occurs.
The liby.a library contains default main() and yyerror() functions.
(main() is the required main program that calls yyparse() to start the
program.) These routines look like the following, respectively:
main() {
setlocale(LC_ALL, );
(void) yyparse();
return(0); }
int yyerror(s);
char *s; {
fprintf(stderr,"%s\n",s);
return (0); }
NOTES
The LANG and LC_* variables affect the execution of the yacc command as
stated. The main() function defined by yacc issues the following call:
setlocale(LC_ALL, )
As a result, the program generated by yacc will also be affected by the
contents of these variables at run time.
The lex program can be compiled as a C program with -std0, -std, or
-std1 mode. It can also be compiled as a C++ program. If YY_NOPROTO is
defined on the compilation command line, function prototypes are not
generated.
EXIT STATUS
The following exit values are returned: Successful completion. An
error occurred.
EXAMPLES
This section describes the example programs for the lex and yacc com‐
mands, which together create a simple desk calculator program that per‐
forms addition, subtraction, multiplication, and division operations.
The calculator program also allows you to assign values to variables
(each designated by a single lowercase ASCII letter), and then use the
variables in calculations. The files that contain the program are as
follows: The lex specification file that defines the lexical analysis
rules. The yacc grammar file that defines the parsing rules and calls
the yylex() function created by lex to provide input.
The remaining text expects that the current directory is the directory
that contains the lex and yacc example program files.
Compiling the Example Program
Perform the following steps to create the example program using lex and
yacc: Process the yacc grammar file using the -d option. The -d option
tells yacc to create a file that defines the tokens it uses in addition
to creating the C language source code file.
yacc-d calc.y
The following files are created: The C language source file that
yacc created for the parser. A header file containing #define
statements for the tokens used by the parser.
(The *.o files are created temporarily and then removed.)
Process the lex specification file:
lex calc.l
The following file is created: The C language source file that
lex created for the lexical analyzer. Compile and link the two
C language source files:
cc -o calc y.tab.c lex.yy.c
The following files are created: The object file for y.tab.c.
The object file for lex.yy.c. The executable program file.
You can then run the program directly by entering: calc
Then, enter numbers and operators in calculator fashion. After you
press <Return>, the program displays the result of the operation. If
you assign a value to a variable as follows, the cursor moves to the
next line:
m=4 <Return> _
You can then use the variable in calculations and it will have the
value assigned to it:
m+5 <Return> 9
The Parser Source Code
The file calc.y has entries in all three of the sections of a yacc
grammar file--declarations, rules, and user subroutines. It contains
the following source code:
%{ #include <stdio.h>
int regs[26]; int base;
%}
%start list
%token DIGIT LETTER
%left '|' %left '&' %left '+' '-' %left '*' '/' '%' %left UMINUS /*sup‐
plies precedence for unary minus */
%% /* beginning of rules section */
list : /*empty */
| list stat '\n'
| list error '\n'
{ yyerrok; }
;
stat : expr
{ printf("%d\n",$1); }
| LETTER '=' expr
{ regs[$1] = $3; }
;
expr : '(' expr ')'
{ $$ = $2; }
| expr '*' expr
{ $$ = $1 * $3; }
| expr '/' expr
{ $$ = $1 / $3; }
| expr '%' expr
{ $$ = $1 % $3; }
| expr '+' expr
{ $$ = $1 + $3; }
| expr '-' expr
{ $$ = $1 - $3; }
| expr '&' expr
{ $$ = $1 & $3; }
| expr '|' expr
{ $$ = $1 | $3; }
| '-' expr %prec UMINUS
{ $$ = -$2; }
| LETTER
{ $$ = regs[$1]; }
| number
;
number : DIGIT
{ $$ = $1; base = ($1==0) ? 8:10; }
| number DIGIT
{ $$ = base * $1 + $2; }
;
%% /* beginning of user subroutines section */ main() {
return(yyparse()); }
yyerror(s) char *s; {
fprintf(stderr,"%s\n",s); }
yywrap() {
return(1); }
The Lexical Analyzer Source Code
The file calc.l contains the lexical analyzer source code. It contains
the rules used to generate the tokens from the input stream. It also
contains include statements for standard input and output, as well as
for the <y.tab.h> file. The yacc program generates the <y.tab.h> file
from the yacc grammar file information, if you use the -d option with
the yacc command. The file <y.tab.h> contains definitions for the
tokens that the parser program uses.
Contents of calc.1: %{
#include <stdio.h> #include "y.tab.h" int c; #if !defined (YYSTYPE)
#define YYSTYPE long #endif extern YYSTYPE yylval; %} %% " " ; [a-
z] {
c = yytext[0];
yylval = c - 'a';
return(LETTER);
} [0-9] {
c = yytext[0];
yylval = c - '0';
return(DIGIT);
} [^a-z 0-9] {
c = yytext[0];
return(c);
}
ENVIRONMENT VARIABLES
The following environment variables affect the execution of yacc: Pro‐
vides a default value for the internationalization variables that are
unset or null. If LANG is unset or null, the corresponding value from
the default locale is used. If any of the internationalization vari‐
ables contain an invalid setting, the utility behaves as if none of the
variables had been defined. If set to a non-empty string value, over‐
rides the values of all the other internationalization variables.
Determines the locale for the interpretation of sequences of bytes of
text data as characters (for example, single-byte as opposed to multi-
byte characters in arguments and input files). Determines the locale
for the format and contents of diagnostic messages written to standard
error. Determines the location of message catalogs for the processing
of LC_MESSAGES.
FILES
A readable description of parsing tables and a report on conflicts gen‐
erated by grammar ambiguities Output file Definitions for token names
Temporary file Temporary file Temporary file Default skeleton parser
for C programs The yacc library
SEE ALSO
Commands: lex(1)
Standards: standards(5)
Programming Support Tools
yacc(1)