@q Copyright 2012-2024, Alexander Shibakov@>
@q This file is part of SPLinT@>

@q SPLinT is free software: you can redistribute it and/or modify@>
@q it under the terms of the GNU General Public License as published by@>
@q the Free Software Foundation, either version 3 of the License, or@>
@q (at your option) any later version.@>

@q SPLinT is distributed in the hope that it will be useful,@>
@q but WITHOUT ANY WARRANTY; without even the implied warranty of@>
@q MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the@>
@q GNU General Public License for more details.@>

@q You should have received a copy of the GNU General Public License@>
@q along with SPLinT.  If not, see <http://www.gnu.org/licenses/>.@>

% The scheme for extracting token equivalences below does not use a
% bootstrap parser, which would be easier.
% To use a different parser (the `prologue' parser, \.{dyytab.tex} in
% this case), some extra steps have to be inserted in
% \.{yybootstrap.sty}. First, the token equivalence table for the `main'
% parser (rather, for the `main' scanner) had to be loaded
% (\.{yybootstrap.sty} usually relies on the tokens that are
% `hard-coded' with the bootstrap parser). Second, it was necessary to
% define \.{\\let\\yylexreturn\\yylexregular} to use the scanner. One
% advantage of using a different parser is the ability to intermix token
% definitions with grammar productions (the bootstrap mode macros in
% \.{\\yyunion} will simply ignore the extra definitions).
% Note also, that the `grammar rule' parser cannot be used in this
% case since the token definitions as they are used in this file fit
% the `prologue' parser syntax only (there are no semicolons at the
% end of the definitions). A more elaborate scheme (similar to how the
% typesetting of rules is set up) using several parsers can be used
% instead. 
\input limbo.sty
\def\optimization{5}
\newread\testeof
\immediate\openin\testeof=\jobname.tok
\ifeof\testeof % make the local token equivalence table
    \let\nx\noexpand
    \edef\tokendeffile{\jobname.tok} % where to put the token equivalence table
    \def\bstrapparser{dyytab.tex}
    \def\bstraptokens{bo.tok}% use token equivalence table to set the values of non-string tokens
                             % this has to be added if a non-bootstrap parser is used to 
                             % extract token information (see the comments above)
    \def\bootstraplexersetup{%
        \let\yylexreturn\yylexreturnregular
        \bootstrapmodetrue
    }
    \toks0{%
        \let\fin\finmod     % this is necessary since the original modifies \output
                            % in a way that conflicts with the scheme in dcols.sty
        \input trt1.sty     % \TeX\ `runtime': temporary register definitions
        \input yycommon.sty % general routines for stack and array access
        \input yymisc.sty   % helper macros (stack manipulation, table processing, value stack pointers)
                            % parser initialization, optimization
        \input yyinput.sty  % input functions
        \input yyparse.sty  % parser machinery
        \input flex.sty     % lexer functions
        \input yyfaststack.sty
        \input yystype.sty  % scanner auxiliary types and functions
        \input yyunion.sty  % parser data structures
        % the main parser
        \let\parsernamespace\empty
        % create token equivalence table (making, say, \tokenID the same as \csname token"identifier"\endcsname)
        \input yybootstrap.sty
        \input yytexlex.sty
        \expandafter\def    %/* adjust the \.{\\yyinput} to recognize \.{\\yyendgame} */
            \expandafter\multicharswitch\expandafter
            {\multicharswitch\yyendgame{\yyinput\yyeof\yyeof\endparseinput\removefinalvb}}%
    }
\else
    \toks0{%
        \input yy.sty
        \modenormal
        \let\currentparsernamespace\parsernamespace
            \def\parsernamespace{[xxdisplay]}% for \pretty... commands to works
            \def\hostparsernamespace{[xxdisplay]}% for the \nameproc macro
            \input xtoks.sty
        \let\parsernamespace\currentparsernamespace % does not really matter
        % the \hostparsernamespace stays `[xxdisplay]' which should cause the
        % \nameproc macro to correct the typesetting of terminals accordingly
    }
\fi
\immediate\closein\testeof
\the\toks0
\input dcols.sty
\initauxstream
@**Parser file.
This is an enhanced parser for expressions. It takes
advantage of the `symbolic term name' mechanism and extends the basic
expression syntax.

The top-level structure of the input file is an exact copy of the one
for the expression parser.
@s TeX_ TeX

@(xxpp.yy@>=
@G Switch to generic mode.
%{@> @<Extended \.{expression} parser \Cee\ preamble@> @=%}
 @> @<Bison options@> @= 
%union {@> @<Union of parser types@> @=}
%{@> @<Extended \.{expression} parser \Cee\ postamble@> @=%}
 @> @<Token and types ...@> @= 
%%
 @> @<Parser productions@> @= 
%%
@g

@ The following is reproduced from the simple expression example.

The \prodstyle{\%token-table} option is not merely a debugging help,
as it is in the case of the `real' \bison\ parsers and cannot be
omitted .  The name table it is responsible for setting up is used as
a set of keys for various associative arrays. Token declarations are
parsed by a bootstrap parser during the \TeX\ processing stage to
establish equivalences between the names kept in |yytname| and the
macro names used internally by the parsers built by \bison. The reason
this is necessary is not very complicated: either version of the token
name can be used in the grammar while the `driver' program
(\.{mkeparser.c}) only has access to the names in |yytname|. In
general, this is important whenever the grammar uses a different set of
token names from the lexer or when diagnostics messages are output. An
important case is the symbolic name switch: before the rules can be
listed to create the switch, the token numerical values must be
known. If the parser is only aware of the |yytname| listed names and the
grammar being parsed uses the `internal' names, the listing macros
will fail.  The array, |yytname| is used in a few functions inside the
`driver', as well, so omitting this option would make building the
parser impossible.
@<Bison options@>=
@G
%token-table
%debug
%start value
@g

@ To continue the token name discussion, this parser uses internal
names only but the |yytname| array contains a string equivalent of
\prodstyle{IDENTIFIER}. Thus, bootstrapping is necessary\footnote{This
was done as a demonstration; changing the definition of
\prodstyle{IDENTIFIER} would easily remove this requirement.}. The beginning
of this file contains a simple scheme for producing a token
equivalence table.
The typesetting of the tokens can be adjusted using \.{\\prettywordpair}
macros (see the included \.{xtoks.sty} file for examples and the way
\prodstyle{IDENTIFIER} is typeset).
@<Token and types declarations@>=
@G
%token IDENTIFIER "identifier"
%token INTEGER
@g

@ Here is the whole grammar, simply additive expressions with two
levels of precedence. We have added `divide' and `subtract' operations.
The use of \prodstyle{IDENTIFIER} instead of \.{"identifier"} below
necessitates `harvesting' of token equivalences in \.{xxpression.tok}
at the beginning of this file.
\showlastactiontrue
\input yynested.sty
@<Parser productions@>=
@G
value:
  expression[exp]                 {@> TeX_( "/yy0{/the/yy]exp[}" ); @=}
;

expression:
  term                            {@> TeX_( "/yy0{/the/yy]term[}" ); @=}
| expression[exp] add_op term     {@> @<Add a term@> @=}
;

term:
  atom                            {@> TeX_( "/yy0{/the/yy]atom[}" ); @=}
| term mult_op atom               {@> @<Make a term@> @=}
;
@t}\vb{\inline\flatten}{@>
mult_op:
  '*'                             {@> TeX_( "/yy0{/multiply}" ); @=}
| '/'                             {@> TeX_( "/yy0{/divide}" ); @=}
;

add_op:
  '+'                             {@> TeX_( "/yy0{}" ); @=}
| '-'                             {@> TeX_( "/yy0{-}" ); @=}
;
@t}\vb{\resetf}{@>
atom:
@t}\vb{\inputboundary{\boundarylower}}{@>
  IDENTIFIER[id]                  {@> @<Assign variable value to an atom@> @=}
| INTEGER[int]                    {@> @<Assign value to an atom@> @=}
| '(' expression[exp] ')'         {@> TeX_( "/yy0{/the/yy]exp[}" ); @=}
;
@t}\vb{\inputboundary{\boundaryupper}}{@>
@g

@ @<Add a term@>=
  @[TeX_( "/tempca/the/yy]exp[/relax" );@]@;
  @[TeX_( "/tempcb/the/yy]term[/relax" );@]@;
  @[TeX_( "/advance/tempca by /the/yy]add_op[/tempcb" );@]@;
  @[TeX_( "/yy0{/the/tempca}" );@]@;

@ @<Make a term@>=
  @[TeX_( "/tempca/the/yy]term[/relax" );@]@;
  @[TeX_( "/tempcb/the/yy]atom[/relax" );@]@;
  @[TeX_( "/the/yy]mult_op[/tempca by /tempcb" );@]@;
  @[TeX_( "/yy0{/the/tempca}" );@]@;

@ @<Assign variable value to an atom@>=
  @[TeX_( "/getsecond{/yy]id[}/to/toksa" );@]@;
  @[TeX_( "/toksb/expandafter/expandafter/expandafter{/expandafter" );@]@;
  @[TeX_( "    /number/csname/the/toksa/endcsname}" );@]@;
  @[TeX_( "/yy0{/the/toksb}" );@]@;

@ @<Assign value to an atom@>=
  @[TeX_( "/getfirst{/yy]int[}/to/toksa" );@]@;
  @[TeX_( "/yy0{/the/toksa}" );@]@;

@ \Cee\ preamble. In this case, there are no `real' actions that our
grammar performs, only \TeX\ output, so this section is empty.

@<Extended \.{expression} parser \Cee\ preamble@>=

@ \Cee\ postamble. It is tricky to insert function definitions that use \bison's internal types,
as they have to be inserted in a place that is aware of the internal definitions but before said 
definitions are used.

@<Extended \.{expression} parser \Cee\ postamble@>=

@ Union of types. Empty as well.

@<Union of parser types@>=

@**The lexer file. The scanner for the grammar above is the same as
for a regular expression parser. Identifiers are interpreted as
variable names that expand to appropriate values.
%\checktabletrue
@(xxpl.ll@>=
@G
 @> @<Lexer definitions@>@= 
%{@> @<Lexer \Cee\ preamble@> @=%}
 @> @<Lexer options@> @= 
%%
 @> @<Regular expressions@> @= 
%%
@g

@ @<Lexer definitions@>=
@G(fs1)
letter    [_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ]
id        {letter}({letter}|[-0-9])*
int       [0-9]+
@g

@ @<Lexer \Cee\ preamble@>=

#include <stdint.h>
#include <stdbool.h>

  void define_all_states( void ){}

@ @<Lexer options@>=
@G(fs1)
%option bison-bridge
%option noyywrap nounput noinput reentrant 
%option noyy_top_state
%option debug
%option stack
%option outfile="xxpl.c"
@g

@ @<Regular expressions@>=
  @<Scan white space@>@;
  @<Scan identifiers@>@;

@ White space skipping. 
\traceparserstatestrue
\tracestackstrue
\tracerulestrue
\traceactionstrue
\tracelookaheadtrue
\traceparseresultstrue
\tracebadcharstrue
\yyflexdebugtrue
%
\traceparserstatesfalse
\tracestacksfalse
\tracerulesfalse
\traceactionsfalse
\tracelookaheadfalse
\traceparseresultsfalse
\tracebadcharsfalse
\yyflexdebugfalse
@<Scan white space@>=
@G(fs2)
[ \f\n\t\v]                        {@> @[TeX_( "/yylexnext" );@]@=}
@g

@ @<Scan identifiers@>=
@G(fs2)
{id}                               {@> @[TeX_( "/yylexreturnval{IDENTIFIER}" );@]@=}
{int}                              {@> @[TeX_( "/yylexreturnval{INTEGER}" );@]@=}
[-+*/()]                           {@> @[TeX_( "/yylexreturnchar" );@]@=}
.                                  {@> @[@<React to a bad character@>@]@=}
@g

@ @<React to a bad character@>=
 @[TeX_( "/iftracebadchars" );@]@;
 @[TeX_( "    /yycomplain{invalid character(s): /the/yytext}" );@]@;
 @[TeX_( "/fi" );@]@;
 @[TeX_( "/yyerrterminate" );@]@;

@**Generating symbols. This is the routine that creates symbolic name
assignments for the grammar. The internal mechanics of creating such
assignments is inside \.{xymmap.sty} which should be consulted if
any adjustments are needed.
@(xymbols.txx@>=
@G
\def\optimization{5} % this can be omitted
\input cwebmac.tex
\input limbo.sty
\input yy.sty
\modenormal
\input xymmap.sty
\end
@g

@**Test file. The test file includes a handy list of debugging options
that can be activated to see the inner workings of the parser and
scanner routines.
@(test.txx@>=
@G
\chardef\other=12 % needed for some macros to work
\input xxpression.sty

\iftrue
    \tracedfatrue
    \traceparserstatestrue
    \tracestackstrue
    \tracerulestrue
    \traceactionstrue
    \tracelookaheadtrue
    \traceparseresultstrue
    \tracebadcharstrue
    \yyflexdebugtrue
    \yyinputdebugtrue
    \traceactioncodetrue
\fi

\newread\ssw
\immediate\openin\ssw = xymbols.sns
\ifeof\ssw
\else
    \immediate\closein\ssw
    \input xymbols.sns
    \let\yysymswitch\symswitch
    \let\yysymcleanup\symswitchoff
\fi

\def\varone{10}
\def\expression{1 + 3 * ( 5 + 7 ) + varone - 10}
\basicparserinit\expandafter\yyparse \expression \yyeof\yyeof\endparseinput\endparse

{
   \newlinechar`^^J
   \immediate\write16{^^Jexpression: \expression^^Jthe value: \the\yyval^^J^^J}
}

\bye
@g
@q Include the list of index section markers; this is a hack to get around @>
@q the lack of control over the generation of \CWEB's index; the correct order @>
@q of index entries depends on the placement of this inclusion @>
@i alphas.hx

@**Index.\global\let\secrangedisplay\empty% do not show the current section range anymore
\global\topskip=9pt
\def\Tex{\TeX\ output}
\def\TeXx{\TeX\ output}