@q Copyright 2012-2024, Alexander Shibakov@> @q This file is part of SPLinT@> @q SPLinT is free software: you can redistribute it and/or modify@> @q it under the terms of the GNU General Public License as published by@> @q the Free Software Foundation, either version 3 of the License, or@> @q (at your option) any later version.@> @q SPLinT is distributed in the hope that it will be useful,@> @q but WITHOUT ANY WARRANTY; without even the implied warranty of@> @q MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the@> @q GNU General Public License for more details.@> @q You should have received a copy of the GNU General Public License@> @q along with SPLinT. If not, see .@> % The scheme for extracting token equivalences below does not use a % bootstrap parser, which would be easier. % To use a different parser (the `prologue' parser, \.{dyytab.tex} in % this case), some extra steps have to be inserted in % \.{yybootstrap.sty}. First, the token equivalence table for the `main' % parser (rather, for the `main' scanner) had to be loaded % (\.{yybootstrap.sty} usually relies on the tokens that are % `hard-coded' with the bootstrap parser). Second, it was necessary to % define \.{\\let\\yylexreturn\\yylexregular} to use the scanner. One % advantage of using a different parser is the ability to intermix token % definitions with grammar productions (the bootstrap mode macros in % \.{\\yyunion} will simply ignore the extra definitions). % Note also, that the `grammar rule' parser cannot be used in this % case since the token definitions as they are used in this file fit % the `prologue' parser syntax only (there are no semicolons at the % end of the definitions). A more elaborate scheme (similar to how the % typesetting of rules is set up) using several parsers can be used % instead. \input limbo.sty \def\optimization{5} \newread\testeof \immediate\openin\testeof=\jobname.tok \ifeof\testeof % make the local token equivalence table \let\nx\noexpand \edef\tokendeffile{\jobname.tok} % where to put the token equivalence table \def\bstrapparser{dyytab.tex} \def\bstraptokens{bo.tok}% use token equivalence table to set the values of non-string tokens % this has to be added if a non-bootstrap parser is used to % extract token information (see the comments above) \def\bootstraplexersetup{% \let\yylexreturn\yylexreturnregular \bootstrapmodetrue } \toks0{% \let\fin\finmod % this is necessary since the original modifies \output % in a way that conflicts with the scheme in dcols.sty \input trt1.sty % \TeX\ `runtime': temporary register definitions \input yycommon.sty % general routines for stack and array access \input yymisc.sty % helper macros (stack manipulation, table processing, value stack pointers) % parser initialization, optimization \input yyinput.sty % input functions \input yyparse.sty % parser machinery \input flex.sty % lexer functions \input yyfaststack.sty \input yystype.sty % scanner auxiliary types and functions \input yyunion.sty % parser data structures % the main parser \let\parsernamespace\empty % create token equivalence table (making, say, \tokenID the same as \csname token"identifier"\endcsname) \input yybootstrap.sty \input yytexlex.sty \expandafter\def %/* adjust the \.{\\yyinput} to recognize \.{\\yyendgame} */ \expandafter\multicharswitch\expandafter {\multicharswitch\yyendgame{\yyinput\yyeof\yyeof\endparseinput\removefinalvb}}% } \else \toks0{% \input yy.sty \modenormal \let\currentparsernamespace\parsernamespace \def\parsernamespace{[xxdisplay]}% for \pretty... commands to works \def\hostparsernamespace{[xxdisplay]}% for the \nameproc macro \input xtoks.sty \let\parsernamespace\currentparsernamespace % does not really matter % the \hostparsernamespace stays `[xxdisplay]' which should cause the % \nameproc macro to correct the typesetting of terminals accordingly } \fi \immediate\closein\testeof \the\toks0 \input dcols.sty \initauxstream @**Parser file. This is an enhanced parser for expressions. It takes advantage of the `symbolic term name' mechanism and extends the basic expression syntax. The top-level structure of the input file is an exact copy of the one for the expression parser. @s TeX_ TeX @(xxpp.yy@>= @G Switch to generic mode. %{@> @ @=%} @> @ @= %union {@> @ @=} %{@> @ @=%} @> @ @= %% @> @ @= %% @g @ The following is reproduced from the simple expression example. The \prodstyle{\%token-table} option is not merely a debugging help, as it is in the case of the `real' \bison\ parsers and cannot be omitted . The name table it is responsible for setting up is used as a set of keys for various associative arrays. Token declarations are parsed by a bootstrap parser during the \TeX\ processing stage to establish equivalences between the names kept in |yytname| and the macro names used internally by the parsers built by \bison. The reason this is necessary is not very complicated: either version of the token name can be used in the grammar while the `driver' program (\.{mkeparser.c}) only has access to the names in |yytname|. In general, this is important whenever the grammar uses a different set of token names from the lexer or when diagnostics messages are output. An important case is the symbolic name switch: before the rules can be listed to create the switch, the token numerical values must be known. If the parser is only aware of the |yytname| listed names and the grammar being parsed uses the `internal' names, the listing macros will fail. The array, |yytname| is used in a few functions inside the `driver', as well, so omitting this option would make building the parser impossible. @= @G %token-table %debug %start value @g @ To continue the token name discussion, this parser uses internal names only but the |yytname| array contains a string equivalent of \prodstyle{IDENTIFIER}. Thus, bootstrapping is necessary\footnote{This was done as a demonstration; changing the definition of \prodstyle{IDENTIFIER} would easily remove this requirement.}. The beginning of this file contains a simple scheme for producing a token equivalence table. The typesetting of the tokens can be adjusted using \.{\\prettywordpair} macros (see the included \.{xtoks.sty} file for examples and the way \prodstyle{IDENTIFIER} is typeset). @= @G %token IDENTIFIER "identifier" %token INTEGER @g @ Here is the whole grammar, simply additive expressions with two levels of precedence. We have added `divide' and `subtract' operations. The use of \prodstyle{IDENTIFIER} instead of \.{"identifier"} below necessitates `harvesting' of token equivalences in \.{xxpression.tok} at the beginning of this file. \showlastactiontrue \input yynested.sty @= @G value: expression[exp] {@> TeX_( "/yy0{/the/yy]exp[}" ); @=} ; expression: term {@> TeX_( "/yy0{/the/yy]term[}" ); @=} | expression[exp] add_op term {@> @ @=} ; term: atom {@> TeX_( "/yy0{/the/yy]atom[}" ); @=} | term mult_op atom {@> @ @=} ; @t}\vb{\inline\flatten}{@> mult_op: '*' {@> TeX_( "/yy0{/multiply}" ); @=} | '/' {@> TeX_( "/yy0{/divide}" ); @=} ; add_op: '+' {@> TeX_( "/yy0{}" ); @=} | '-' {@> TeX_( "/yy0{-}" ); @=} ; @t}\vb{\resetf}{@> atom: @t}\vb{\inputboundary{\boundarylower}}{@> IDENTIFIER[id] {@> @ @=} | INTEGER[int] {@> @ @=} | '(' expression[exp] ')' {@> TeX_( "/yy0{/the/yy]exp[}" ); @=} ; @t}\vb{\inputboundary{\boundaryupper}}{@> @g @ @= @[TeX_( "/tempca/the/yy]exp[/relax" );@]@; @[TeX_( "/tempcb/the/yy]term[/relax" );@]@; @[TeX_( "/advance/tempca by /the/yy]add_op[/tempcb" );@]@; @[TeX_( "/yy0{/the/tempca}" );@]@; @ @= @[TeX_( "/tempca/the/yy]term[/relax" );@]@; @[TeX_( "/tempcb/the/yy]atom[/relax" );@]@; @[TeX_( "/the/yy]mult_op[/tempca by /tempcb" );@]@; @[TeX_( "/yy0{/the/tempca}" );@]@; @ @= @[TeX_( "/getsecond{/yy]id[}/to/toksa" );@]@; @[TeX_( "/toksb/expandafter/expandafter/expandafter{/expandafter" );@]@; @[TeX_( " /number/csname/the/toksa/endcsname}" );@]@; @[TeX_( "/yy0{/the/toksb}" );@]@; @ @= @[TeX_( "/getfirst{/yy]int[}/to/toksa" );@]@; @[TeX_( "/yy0{/the/toksa}" );@]@; @ \Cee\ preamble. In this case, there are no `real' actions that our grammar performs, only \TeX\ output, so this section is empty. @= @ \Cee\ postamble. It is tricky to insert function definitions that use \bison's internal types, as they have to be inserted in a place that is aware of the internal definitions but before said definitions are used. @= @ Union of types. Empty as well. @= @**The lexer file. The scanner for the grammar above is the same as for a regular expression parser. Identifiers are interpreted as variable names that expand to appropriate values. %\checktabletrue @(xxpl.ll@>= @G @> @@= %{@> @ @=%} @> @ @= %% @> @ @= %% @g @ @= @G(fs1) letter [_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ] id {letter}({letter}|[-0-9])* int [0-9]+ @g @ @= #include #include void define_all_states( void ){} @ @= @G(fs1) %option bison-bridge %option noyywrap nounput noinput reentrant %option noyy_top_state %option debug %option stack %option outfile="xxpl.c" @g @ @= @@; @@; @ White space skipping. \traceparserstatestrue \tracestackstrue \tracerulestrue \traceactionstrue \tracelookaheadtrue \traceparseresultstrue \tracebadcharstrue \yyflexdebugtrue % \traceparserstatesfalse \tracestacksfalse \tracerulesfalse \traceactionsfalse \tracelookaheadfalse \traceparseresultsfalse \tracebadcharsfalse \yyflexdebugfalse @= @G(fs2) [ \f\n\t\v] {@> @[TeX_( "/yylexnext" );@]@=} @g @ @= @G(fs2) {id} {@> @[TeX_( "/yylexreturnval{IDENTIFIER}" );@]@=} {int} {@> @[TeX_( "/yylexreturnval{INTEGER}" );@]@=} [-+*/()] {@> @[TeX_( "/yylexreturnchar" );@]@=} . {@> @[@@]@=} @g @ @= @[TeX_( "/iftracebadchars" );@]@; @[TeX_( " /yycomplain{invalid character(s): /the/yytext}" );@]@; @[TeX_( "/fi" );@]@; @[TeX_( "/yyerrterminate" );@]@; @**Generating symbols. This is the routine that creates symbolic name assignments for the grammar. The internal mechanics of creating such assignments is inside \.{xymmap.sty} which should be consulted if any adjustments are needed. @(xymbols.txx@>= @G \def\optimization{5} % this can be omitted \input cwebmac.tex \input limbo.sty \input yy.sty \modenormal \input xymmap.sty \end @g @**Test file. The test file includes a handy list of debugging options that can be activated to see the inner workings of the parser and scanner routines. @(test.txx@>= @G \chardef\other=12 % needed for some macros to work \input xxpression.sty \iftrue \tracedfatrue \traceparserstatestrue \tracestackstrue \tracerulestrue \traceactionstrue \tracelookaheadtrue \traceparseresultstrue \tracebadcharstrue \yyflexdebugtrue \yyinputdebugtrue \traceactioncodetrue \fi \newread\ssw \immediate\openin\ssw = xymbols.sns \ifeof\ssw \else \immediate\closein\ssw \input xymbols.sns \let\yysymswitch\symswitch \let\yysymcleanup\symswitchoff \fi \def\varone{10} \def\expression{1 + 3 * ( 5 + 7 ) + varone - 10} \basicparserinit\expandafter\yyparse \expression \yyeof\yyeof\endparseinput\endparse { \newlinechar`^^J \immediate\write16{^^Jexpression: \expression^^Jthe value: \the\yyval^^J^^J} } \bye @g @q Include the list of index section markers; this is a hack to get around @> @q the lack of control over the generation of \CWEB's index; the correct order @> @q of index entries depends on the placement of this inclusion @> @i alphas.hx @**Index.\global\let\secrangedisplay\empty% do not show the current section range anymore \global\topskip=9pt \def\Tex{\TeX\ output} \def\TeXx{\TeX\ output}