% Copyright 2012-2024, Alexander Shibakov
% This file is part of SPLinT
%
% SPLinT is free software: you can redistribute it and/or modify
% it under the terms of the GNU General Public License as published by
% the Free Software Foundation, either version 3 of the License, or
% (at your option) any later version.
%
% SPLinT is distributed in the hope that it will be useful,
% but WITHOUT ANY WARRANTY; without even the implied warranty of
% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
% GNU General Public License for more details.
%
% You should have received a copy of the GNU General Public License
% along with SPLinT. If not, see .
% the namespace choices below are a bit random as this is a demo only
\input limbo.sty
\def\optimization{5}
\input yy.sty
\modenormal
\input dcols.sty
\input symmap.sty
\let\parsernamespace\flexnamespace
\let\hostparsernamespace\flexnamespace
\let\tokeneq\tokeneqpretty
\let\optstrextra\optstrextraesc
%\input fo.tok
\input ftokenset.sty %
\let\parsernamespace\flexpseudorenamespace
\let\hostparsernamespace\flexpseudorenamespace
\input fretokenset.sty % regular expression names
\def\symnamespace{[symbols]}
\let\currentnamespace\parsernamespace
\let\parsernamespace\symnamespace
\input symtoks.sty %
\let\tokeneq\tokeneqpretty
\let\optstrextra\optstrextraesc
\input fo.tok
\input ftokenset.sty
\let\parsernamespace\indexpseudonamespace
\input yypretty.sty
\let\parsernamespace\currentnamespace
\let\hostparsernamespace\symnamespace % the namespace where tokens are looked up
% for typesetting purposes
\immediate\openout\exampletable=\jobname.exl
\def\cite[#1]{%
\def\next{#1}\setbox0=\hbox{l}
[\ifx\next\empty$\,$\hbox{\vrule width\wd0 height\ht0 depth\dp0}$\,$\else #1\fi]%
}
\let\oldN\N
\let\M\textM
\let\N\chapterN
\font\ttit=cmitt10
\defreserved{Y}{\.{Y}}
\input symfm.sty
\input slimbo.sty
\topskip=9pt
\initauxstream
@** Introduction.
The manual supplied with \splint\ presents an outline of the main
features of the package. Its main focus, however, is on the
general parser design using the package. The two parsers that come
with \splint, for pretty printing \bison\ and \flex\ are used as an
illustration only. A full featured parser design of a parser for
pretty printing linker scripts does not treat the \bison\ and \flex\
parsers in any detail, either. Partially filling this gap is the main
reason for this example\footnote{A secondary reason is to provide a testbed
for typesetting experiments.}.
The same parser and lexer (with a slightly different input routine)
may be used to typeset \bison\ and \flex\ examples in text, as
well. There are some subtle differences between doing it inside a
straightforward \TeX\ file and a \CWEB\ section. The obvious one of
these is the requirement to use \.{@@@@} whenever a single \.{@@} is
called for in the \TeX\ input. See below for further examples.
Typesetting grammar examples also calls for a wider range of
typographic devices than the ones used for pretty printing \bison\
(and \flex) code. Formatting and modifying the typesetting of \bison\
productions and \flex\ scanner rules is given some consideration as
well.
While this example is rather short, it has enough variety to present
the indexing features of the macros supplied with \splint in some
detail.
Finally, a few macros, hastily thrown together, show how \CWEB\ may be
used to create the documentation with a `book feel', including the
custom typesetting of chapter headers, sectioned index, etc. Not all
of the features mentioned above may be desired for any one project but
in case some of them are, these examples provide a convenient place to
consult about the details of the implementaion for one's own use.
%\let\N\textN
@s TeX_ TeX
@* Examples of \bison\ parser output.
Some of the features specific to the use of the \bison\ parser for this purpose
are explained below. One might find it useful to keep this section
as a quick reference while typesetting his own examples (for example, it is probably
unintuitive that `\.{`}' produces `\.{\yl}' but there is simply no
way to use `\.{\yl}' as a character inside the \TeX\ section of \CWEB).
The first, rather eclectic and lengthy example demonstrates various
typesetting features of the \bison\ parser. The parsing output as well
as the resulting table are saved in {\tt \jobname.exl}. All the \Cee\
typesetting is performed by \CWEB, using its \.{\yl}$\ldots$\.{\yl}
facility.
\saveparseoutputtrue
\expandafter\def\csname lexspecial[^^D^^D]\endcsname{}
\medskip
\beginprod
\inline
example:
term.1 term.2 \{\} term.3 \stashed{\relax} \{\stashed{\relax}\}
` term.more \{\} "nonterminal"[sym_name.1] term_other[sym_name.2] \{\}
` \
` \%empty \
` terms terms ',' terms \%? \{ \stashed{|a = b = c = d = e;|} \} \%dprec 7
` terms terms ',' terms \%? \{ \stashed{|a = b = c;|} \}
;
%
another:
term.8 \%merge term.one \%dprec 3 term.two \{\stashed{\rm|int a, b, c;|}\}
` term.17 \%merge \{\stashed{|f(a,b)==c;|}\}
` term.78 \{\stashed{|h(b)=g(c);|}\} \%merge \%dprec 0x17
` term.77 \{\stashed{|h(b)=g(c);|}\} \%prec new_term
;
%
\resetf
and_another:
term.8 \%merge term.one \%dprec 3 term.two \{\stashed{\rm|int a, b, c;|}\}
` term.17 \%merge \{\stashed{|f(a,b)==c;|}\}
` term.78 \{\stashed{|h(b)=g(c);|}\} \%merge \%dprec 0x17
` term.77 \{\stashed{|h(b)=g(c);|}\} \%prec new_term
;
%
\%token ANOTHER NONEXISTENT GENERIC TOKEN 7 "token" ANOTHER 0x77 "more" TOKEN TOKEN ;
\%token bogey1 bogey2 ;
\%type TOKEN ANOTHER ;
\%start inputer;
\stashed{\rm Example 1 of flushed code (delayed till next \.{\\stashed} is encountered).}\sflush{F}{flush this}
\%default-prec;
\%no-default-prec;
\stashed{\rm Example 2 of flushed code}
\%destructor \{ \stashed{\rm |func(int a, char b); a = b + c;|}\^^D\^^D % ignored because anything is accepted inside braces
\stashed{\6\rm |func2(int a, char b);|} \} \^^D\^^D
"token" TOKEN NONEXISTENT "none" BOGEY "more" A_TOKEN IDENTIFIER;
\%printer \{ \stashed{\rm |func(int a, char b); a = b + c;|}
\stashed{\6\rm |func2(int a, char b);|} \} \^^D\^^D
"token" TOKEN NAME NONEXISTENT "none" ANOTHER BOGEY "more" A_TOKEN identifier.1 identifier.2;
\%code token.3 \{ \stashed{\rm |func(int a, char b); a = b + c;|}
\stashed{\6\rm |func2(int a, char b);|} \};
\%code \{ \stashed{\rm |new_function(int x, char y); |} \};
\%left "one" 1 "two" 2 three.137 0x7;
\%precedence five six seven;
\%nonassoc two;\^^D\^^D
\%code \{ \stashed{\rm |other_function(int x, char y); |} \};
\endprod
\medskip
%\checktabletrue
\beginprod
\%expect 0x137;
\%expect-rr 17;
\%lex-param \{\stashed{\rm|int number;|}\};
\%define var.1 \{ \stashed{\rm |func3(8, "string"){n = m++;}| } \}
\%union var.2 \{ \stashed{\rm |int a, b, c;|\6\rm |char a_char;|\C{font switching must be applied to every line (see the source)}} \}
\%\{ \stashed{\rm |int a, b, c;|\6 |char a_char;|} \%\}
\endprod
%
\medskip
\noindent The next example is a demonstration of the hidden context
added to an incomplete language fragment and local typesetting
variations enabled by such context.
\def\cset#1{%
\nx\colorset{darkwood}%
{#1}%
\nx\restorecolor
}%
\def\dset#1{%
$\nx\underbrace{\hbox{#1}}_{{\nx\rm id:\ \hbox{\sixrm\the\toksa}}}$% TODO: find a better way to switch
% the font family explicitly
}%
\def\esets#1{%
\nx\beginub#1%
}%
\def\esete#1{%
#1\nx\endub
}%
\def\beginub#1\endub{%
$\underbrace{\hbox{#1}}_{\rm a\ group}$%
}
\checktabletrue
%\yyflexdebugtrue
\smallskip
\beginprod
\skipheader
ghost:
headerless_term.1 \formatlocal{\let\termmetastyle\cset} headerless_term.2 \{\stashed{\colorset{link}\rm|color(x,y,z);|\restorecolor}\}
` \formatlocal{\let\termmetastyle\dset}more.of.the.same.0\formatlocal{\restorecs{table-render}\termmetastyle} but.not.here
\{\stashed{\rm|color(a, b, c);|}\}
` \formatlocal{\let\termmetastyle\esets}three.more.terms\formatlocal{\restorecs{table-render}\termmetastyle}follow
\formatlocal{\let\termmetastyle\esete}this\formatlocal{\restorecs{table-render}\termmetastyle}one\{\stashed{\rm|assign(x, y, z);|}\}
;
\endprod
\checktablefalse
\yyflexdebugfalse
\medskip
\noindent Next, an incomplete listing of the characters that can be
typeset, as well as the way to typeset\footnote{Please note that we are discussing the issues of typesetting
{\let\tt\tti\it examples of \bison\ input in text\/} at the moment; the parser reading the code from
{\let\tt\tti\it the \Cee\ portion of the \CWEB\ input\/} typesets these symbols automaticaly.} them (only the `tricky' cases
are listed). The use of `\.{`}' to typeset `\.{\yl}' deserves a special note---\CWEB's rules make it
nearly impossible to use `\.{\yl}' in the \TeX\ portion of the program. One way to avoid using this
relatively unnatural notation is to put the production example in a separate \TeX\ file as demonstrated
by the \prodstyle{symbol\_tricks2} example below, included from \.{symtricks.sty}. The same example uses
a few parser facilities to override the typesetting defaults of the standard production demo setup (such as using
\.{\\insertraw} to reset the last action display).
The uniform alignment across several productions below was accomplished with \.{\\setglobalalignrules} by using
the value of \.{\\gaglue} set by a copy of one of the productions.
\begingroup
\setbox0\hbox{\ninepoint look: $\rightarrow\,$\X{$\infty$}:See this example to deduce $\ldots$\X}%
\setbox0=\vbox{
\medskip
\beginprod
\insertraw{\let\stashnext\stashnextwithspace}%
line_breaking_and_symbols:
GEN\stashed{|stash!=0|}ERIC '(' expression',' \ ss another es')' \
\insertraw{\let\stashnext\stashnextwithnothing}%
` inline_\stashed{look: $\rightarrow\,$}c \{ \stashed{\X{$\infty$}:See this example to deduce $\ldots$\X\6}\stashed{|b == a - c|} \}
` more_inline_c \{ \stashed{|func(int a, char b);|} \}
%
\endprod
\expandafter
}%
\expandafter\setglobalalignrules\expandafter{\the\gaglue}%
\medskip
\tomainparser
\prettywordpair{GENERIC}{\_Generic}
\prettywordpair{ss}{$^{\rm C99[}\,$\aftergroup\aftergroup\aftergroup\ignorespaces}
\prettywordpair{es}{\unskip$\,{}^{\rm ]C99}$}% there is still a problem when this appears in headers
\beginprod
line_breaking_and_symbols:
GENERIC '(' expression',' \ ss another es')' \
` inline_c \{ \stashed{\X{$\infty$}:See this example to deduce $\ldots$\X\6}\stashed{|b == a - c|} \}
` more_inline_c \{ \stashed{|func(int a, char b);|} \}
%
\endprod
\medskip
\beginprod
\format{\inline\flatten}
symbol_tricks:
'\&' \
` '*' \
` '+' \
` '-' \
` '\~' \
` '!' \
` '\{' \
` '`' \
` '\`' \
` '\'' \
` '\\' \
` ' ' \
;
\endprod
\medskip
\input symtricks.sty
\noindent The stash chunks, inserted by \.{\\stashed\{}{\it random input\/}\.{\}} are invisible to the parser.
As an example, the stash producing the action in the first rule below (|stash!=0|) was
inserted in the middle of the first term (\prodstyle{GENERIC}). The space (\.{\ }) is a special case.
\medskip
\beginprod
line_breaking_and_symbols:
GEN\stashed{|stash!=0|}ERIC '(' expression',' \ ss another es')' \
` inline_\stashed{look: $\rightarrow\,$}c \{ \stashed{\X{$\infty$}:See this example to deduce $\ldots$\X\6}\stashed{|b == a - c|} \}
` more_inline_c \{ \stashed{|func(int a, char b);|} \}
%
\endprod
\medskip
\noindent The behavior or the input routine mentioned above is adjustable by redefining \.{\\stashnext}. These adjustments may
be even made locally, for small portions of the input only, using \.{\\insertraw}.
Here is the same set of productions with stash producing a space in the middle of \prodstyle{GENERIC} reverting to the usual,
`invisible' behavior by the time \.{\\yyinput} reaches \prodstyle{inline\_c} (that has
`\.{\\stashed\{}$\,$look: $\rightarrow\,$\.{\}}' inserted before~\prodstyle{\_c}):
\medskip
\beginprod
\insertraw{\let\stashnext\stashnextwithspace}%
line_breaking_and_symbols:
GEN\stashed{|stash!=0|}ERIC '(' expression',' \ ss another es')' \
\insertraw{\let\stashnext\stashnextwithnothing}%
` inline_\stashed{look: $\rightarrow\,$}c \{ \stashed{\X{$\infty$}:See this example to deduce $\ldots$\X\6}\stashed{|b == a - c|} \}
` more_inline_c \{ \stashed{|func(int a, char b);|} \}
%
\endprod
\endgroup
@* Examples of \flex\ parser output. Standalone regular expressions can be displayed using \.{\\flexrestyle}:
{\it \flexrestyle{\^\\\\[\\"\\'?\\\\]}}. Portions of \flex\ files may be typeset with the help of
\.{\\beginflex}$\ldots$\.{\\endflex} macros. Just as in the case of \bison\ productions, care must be taken to
escape some symbols that have special meaning to \TeX. The ones that {\it must be\/} escaped when used inside
regular expressions are `\.{\{}', `\.{\}}', `\.{\\}', `\.{\ }' (see more below), and~`\.{\%}'. Others, such as `\.{\^}', `\.{\_}', `\.{\$}', `\.{\#}',
and~`\.{\&}' do not require any special treatment (although they continue to perform their special functions
inside \.{\\stashed} blocks). As a note of caution, `$\ldots$\.{\\\\]}' results
in `$\ldots$\flexrestyle{\\]}' and not
{%
\let\flbraceccl\flbraceccldemo\savecs{flexparser-re}\flbraceccl
`\flexrestyle{[\\\\]}'%
} as might have been intended (i.e.~the bracket, \.{]} is treated as an ordinary character, and not as part of
the syntax for a character class). This is because the escape character (\.{\\}) serves a special r\^ole in \flex\ so
to get the desired effect one must type \.{\\\\\\\\]}. The use of `\.{\yl}' deserves a special mention. As was pointed out above,
this character is nearly inaccessible in the \TeX\ mode of \CTANGLE, which resulted in the following workaround. To use
`\.{\yl}' in the examples typeset inside the \TeX\ portion of the \CWEB\ input, one should type `\.{`}'. To use `\.{`}', type
`\.{\\`}' instead. If the example is not part of a \CWEB\ input (for example it is included from its own \TeX\ file similar to
\.{symtricks.sty} above) then
one can use the `\.{\yl}' character as intended. However, even inside a `pure \TeX\ file' to get `\.{`}', one must still type `\.{\\`}'.
Finally, it is worth remembering that a space (\.{\ }) would terminate a regular expression (and produce a syntax error) inside
\.{\\flexrestyle}. This is part of the regular \flex\ syntax and not a \splint\ specific limitation. To make a space character
a part of a regular expression it must be escaped.
Many of the points made above may become more transparent after examining the source of the example following this sentence.
\medskip
\cdebugtrue
\beginflex
\stashed{\C{ Comments are possible with some effort }}
\{
^\{WS\}([\ a-x#\\\\]`[\`0-9\\`])\\n\\r \{\stashed{|x=@t}2^y{@>|}\}
^"/*"$ \{\stashed{|start_comment(@tWatch out for `\.{\yl}'!@>)|}\}
\}
\endflex
\cdebugfalse
\medskip
\noindent While, technically speaking, \flex\ has a `parser stack' in the sense that in the event of an unsuccessful parsing pass
with a `section 2' parser, a `section 1' parser may be attempted, this strategy often fails. As a short excerpt immediately
following this section shows, `section 1' input may also pass for syntactically correct `section 2' \flex\ code (although with
entirely wrong semantics). Thus a better `lazy' approach is to mark all \flex\ code as `section 1' instead.
@=
@G(fs1)
WS [[:blank:]]+
OPTWS [[:blank:]]*
NOT_WS [^[:blank:]\r\n]
NL \r?\n
NAME ([[:alpha:]_][[:alnum:]_-]*)
NOT_NAME [^[:alpha:]_*\n]+
SCNAME {NAME}
ESCSEQ (\\([^\n]|[0-7]{1,3}|x[[:xdigit:]]{1,2}))
FIRST_CCL_CHAR ([^\\\n]|{ESCSEQ})
CCL_CHAR ([^\\\n\]]|{ESCSEQ})
CCL_EXPR ("[:"^?[[:alpha:]]+":]")
LEXOPT [porkacne]
M4QSTART "[["
M4QEND "]]"
@ Lexer specific options can be typeset as well. Any portion specific to \Cee\ can be relegated
to \CWEB\ as can be seen below. Adjusting the typesetting of various \Cee\ terms works as expected
(see the source for this example).
@s scan_state int
@=
@G(fs1)
%option bison-bridge
%option noyywrap noinput reentrant nounput
%option header-file="lexer.h"
%option prefix="main_c"
%option extra-type="@>struct scan_state *@="
%option stack
@g
@ The first three lines of the previous section successfully parse as section~2 input.
\parseverbosetrue
@=
@G(fs2)
WS [[:blank:]]+
OPTWS [[:blank:]]*
NOT_WS [^[:blank:]\r\n]
@ @=
@G(fs1)
/* Comment before the section is put after the states list */
@@=>{
^{WS} {@> @[TeX_( "/flindented@@codetrue/yyBEGIN{CODEBLOCK}/yylexnext" );@]@=}
^"/*" {@> @[TeX_( "/yypushstate{COMMENT}/yylexnext" );@]@=}
^#{OPTWS}line{WS} {@> @[TeX_( "/yypushstate{LINEDIR}/yylexnext" );@]@=}
^"%s"{NAME}? {@> @[TeX_( "/yylexreturnptr{SCDECL}" );@]@=}
^"%x"{NAME}? {@> @[TeX_( "/yylexreturnptr{XSCDECL}" );@]@=}
^"%{".*{NL} {@> @ @=}
^"%top"[[:blank:]]*"{"[[:blank:]]*{NL} {@> @ @=}
^"%top".* {@> @[TeX_( "/yyfatal{malformed '/harmlesscomment top' directive}" );@] @=}
{WS} {@> @[;@]/* discard */ @=}
^"%%".* {@> @ @=}
^"%pointer".*{NL} {@> @[TeX_( "/flinc@@linenum/yylexreturn{POINTER_OP}" );@]@=}
^"%array".*{NL} {@> @[TeX_( "/flinc@@linenum/yylexreturn{ARRAY_OP}" );@]@=}
^"%option" {@> @[TeX_( "/yyBEGIN{OPTION}/yylexreturn{OPTION_OP}" );@]@=}
^"%"{LEXOPT}{OPTWS}[[:digit:]]*{OPTWS}{NL} {@> @[TeX_( "/flinc@@linenum/yyflexoptreturn{OPT_DEPRECATED}" );@]@=}
^"%"{LEXOPT}{WS}.*{NL} {@> @[TeX_( "/flinc@@linenum/yyflexoptreturn{OPT_DEPRECATED}" );@]@=}
^"%"[^porksexcan{}].* {@> @[TeX_( "/yyfatal{unrecognized '/harmlesscomment' directive: /the/yytext}" );@] @=}
^{NAME} {@> @ @=}
{SCNAME} @> @[TeX_( "/RETURNNAME" );@] @=
^{OPTWS}{NL} {@> @[TeX_( "/flinc@@linenum/yylexnext" );@]/* allows blank lines in section 1 */@=}
{OPTWS}{NL} {@> @[TeX_( "/flinc@@linenum/yylexnext" );@]/* maybe end of comment line */@=}
}
@ @=
@ @=
@ @=
@ @=
@q Include the list of index section markers; this is a hack to get around @>
@q the lack of control over the generation of \CWEB's index; the correct order @>
@q of index entries depends on the placement of this inclusion @>
@i alphas.hx
@** Index. \global\let\secrangedisplay\empty% do not show the current section range anymore
Various identifiers in \bison\ productions and \flex\ sections are put in the index, along with
the identifiers from the \Cee\ portions of the \CWEB\ input. The
mechanism used to typeset these identifiers is different from the one
employed by the \CWEB's indexing macros. While the \.{\\I} macros in
\.{cwebmac.tex} pass the actual typesetting commands to \TeX, \splint\
only outputs the context in which the identifier was encountered. By
redefining the macros that interpret this context to typeset the
index, several useful effects can be achieved\footnote{One pretty common use is to redefine
macros that take parameters to take none.}.
\def\otherlangindexseparator{% the index is too short
\toksg{}%
\vskip.5\baselineskip
\centerline{B{\sc ISON}, F{\sc LEX, AND} \TeX\ {\sc INDICES}}%
\vskip.5\baselineskip
}