@* Rules of the grammar. @:title@> \def\:#1{`\.{@@#1}'}% for in case this file is processed in isolation We first arrange the proper setting of |rule_mask|, which will control the selection of rules actually used. Recall that any bits set in the mask of a rule prescribe its {\it suppression\/} when the same bit is set in |rule_mask|; therefore for instance the bit characterising \Cpp\ is called |no_plus_plus|, so that rules specifying it will not be loaded for \Cpp. In some cases two masks will be combined using the bitwise-or operator `|@v|', this means (somewhat counterintuitively) that the rule will only be selected if the conditions represented by the two masks are {\it both\/} satisfied. The use of the bitwise-and operator `|&|' is even more exceptional: it is only meaningful if its two operands both select one setting of the same three-way switch; the rule will then be selected if that switch is in either of the two indicated positions. The |merged_decls| flag is special in that setting `\.{+m}' only enables an extra rule, but does no disable any rules; therefore only one bit is used for this option, and raising this bit in |rule_mask| suppresses the rule marked with |merged_decls|. @d cwebx 0x0001 /* use normally */ @d compatibility 0x0002 /* use in compatibility mode */ @d only_plus_plus 0x0004 /* use in \Cpp\ */ @d no_plus_plus 0x0008 /* use in ordinary \Cee\ only */ @d unaligned_braces 0x0050 /* use if `\.{+u}' flag was set */ @d aligned_braces 0x0020 /* use unless `\.{+u}' flag was set */ @d wide_braces 0x0030 /* use if `\.{+w}' set */ @d standard_braces 0x0060 /* use unless `\.{+u}' or `\.{+w}' set */ @d merged_decls 0x0080 /* use if `\.{+m}' set */ @d forced_statements 0x0100 /* use if `\.{+a}' or `\.{+f}' set */ @d no_forced_statements 0x0600 /* use unless `\.{+a}' or `\.{+f}' set */ @d all_stats_forced 0x0300 /* use if `\.{+a}' set */ @d not_all_stats_forced 0x0400 /* use unless `\.{+a}' set */ @< Set initial values @>= rule_mask= (compatibility_mode ? 0x0001 : 0x0002) | (C_plus_plus ? 0x0008 : 0x0004) | (flags['w'] ? 0x0040 : flags['u'] ? 0x0020 : 0x0010) | (flags['m'] ? 0x0000 : 0x0080) | (flags['a'] ? 0x0400 : flags['f'] ? 0x0200 : 0x0100) ; { static reduction rule[] = { @< Rules @>@;@; }; @/int i=array_size(rule); @+ do install_rule(&rule[--i]); while (i>0); #ifdef DEBUG if (install_failed) fatal("inconsistent grammar",0); #endif } @ {\it Expressions}. @:rules@> These rules should be obvious. Rule~5 allows typedef identifiers to be used as field selectors in structures; rules 7~and~8 attach a parameter list in a function call. In rule~14 we prefix a potentially binary operator such as `|*|' that is used in a unary way by a `\.{\\mathord}' command to make sure that \TeX\ will not mistake it for a binary operator. In simple cases such as |*p| this is redundant, but if such operators are repeated more than one level deep, as in |**p|, \TeX\ would otherwise treat the first operator as the left operand of the second, and insert the wrong spacing. Moreover, typical \Cee~constructions as a cast |(void*) &x| or a declaration |char *p@;| would confuse \TeX\ even more. In rule~13 we need not insert `\.{\\mathord}', since operators of category |unop| are already treated as ordinary symbols by~\TeX. @< Rules @>= { 1, {{expression, unop}}, {expression, NULL}}, @/ { 2, {{expression, binop, expression}}, {expression, NULL}}, @/ { 3, {{expression, unorbinop, expression}}, {expression, NULL}}, @/ { 4, {{expression, select, expression}}, {expression, NULL}}, @/ { 5, {{expression, select, int_like}}, {expression, "__$_"}}, @/ { 6, {{expression, comma, expression}}, {expression, "__p1_"}}, @/ { 7, {{expression, expression}}, {expression, NULL}}, @/ { 8, {{expression, lpar, rpar}}, {expression, "__,_"}}, @/ { 9, {{expression, subscript}}, {expression, NULL}}, @/ {10, {{lpar, expression, rpar}}, {expression, NULL}}, @/ {11, {{lbrack, expression, rbrack}}, {subscript, NULL}}, @/ {12, {{lbrack, rbrack}}, {subscript, "_,_"}}, @/ {13, {{unop, expression}}, {expression, NULL}}, @/ {14, {{unorbinop, expression}}, {expression, "o__"}}, @[@] @~Here are some less common kinds of formulae. Processing the colon belonging to the question mark operator in math mode will give it the proper spacing, which is different from that of a colon following a label. Rule~21 processes casts, since the category |parameters|, which represents parenthesised lists specifying function argument types, encompasses the case of a single parenthesised type specification. The argument of |sizeof| may be a type specification rather than an expression; in \Cee\ (unlike \Cpp) it then must be parenthesised. %, but not in \Cpp\ (and |sizeof_like| might be `\&{new}'). @< Rules @>= {20, {{question, expression, colon}}, {binop, "__m_"}}, @/ {21, {{parameters, expression}}, {expression, "_,_"}}, @/ {22, {{sizeof_like, parameters}}, {expression, NULL}}, @/ {23, {{sizeof_like, expression}}, {expression, NULL}}, @/ {24, {{sizeof_like, int_like}}, {expression,"_~_"},only_plus_plus}, @[@] @ {\it Declarations}. In a declaration in \Cee, the identifier being declared is wrapped up in a declarator, which looks like an expression of a restricted kind: only prefix asterisk, postfix subscript and formal parameters, and parentheses are used. In a bottom-up parser of the kind we are using, it is natural, and hardly avoidable, that declarators are parsed as expressions. Therefore we start recognising a declaration when we see a type specifier followed by the first declarator; at that point we have a succession `|int_like| |expression| |semi|' or `|int_like| |expression| |comma|' (rules 31~and~33). It is also possible that there are no declarators at all, namely when a |struct|, |union|, or~|enum| specifier is introduced without declaring any variables; in that case we have `|int_like| |semi|' (rule~32). Because the type specifier might be composite, like |unsigned long int|, and there might moreover be storage class specifiers and type modifiers (like `|const|'), we first contract any sequence of |int_like| items to a single one (rule~30). In case the declarator was followed by a comma we reduce to |int_like|, so that the next declarator can be matched, otherwise we reduce to |declaration|. It is not quite true that declarators always look like expressions, since the type modifiers `|const|' and `|volatile|' may penetrate into declarators. When they do they will almost always be preceded by an asterisk, and rule~34 will treat such cases. The choice for |int_like| as the result category is not completely obvious, since it makes the modifier and the preceding asterisk part of the type specifier rather than of the declarator, which strictly speaking is not correct; the choice for |unop| or |unorbinop| might therefore seem a more logical one. One reason for not doing that is that a space would have to be inserted in the translation after the modifier scrap, which would not look right in abstract declarators for contrived cases like \hbox{|int f(char *const)@;|}; more importantly, if the modifier would become part of the declarator, it would be a (reserved) identifier that precedes the identifier actually being declared, and when the declarator then receives a call from |make_underlined| by rule 31~or~33, it would mislead |first_ident|. The current solution has a small flaw as well, since it cannot handle the situation where the modifier is separated from the type specifier by a parenthesis, as in $\&{void}~(\m*\&{const}~\m*f)~(\&{int})$; such cases are quite uncommon, are hard to handle by rules that will not spuriously match in other situations, and even then they would still cause problems with |make_underlined|, so we do not attempt to handle them. @< Rules @>= {30, {{int_like, int_like}}, {int_like, "_~_"}}, @/ {31, {{int_like, expression, semi}}, {declaration, "_~!__"}}, @/ {32, {{int_like, semi}}, {declaration, NULL}}, @/ {33, {{int_like, expression, comma}}, {int_like, "_~!__p1"}}, @/ {34, {{unorbinop, int_like}}, {int_like, "o__"}}, @[@] @ If a typedef identifier is simultaneously used as a field selector in a |struct| or |union| declaration, it must be made to parse as expression and be printed in italic type; this can be achieved by placing the magic wand \:; before the identifier, by rule~35. The reason that we place \:; at the beginning rather than at the end of the construction here, is to prevent the |int_like| identifier from combining with something before it first. Rule~35 only applies if the \:; does not match by any rule with what comes before it. Rule~36 handles the case that a function is declared with specified argument types, which is not handled by the expression syntax given until now. It also parses new-style (\caps{ANSI/ISO}) headings of function definitions; in that case, the resulting |function_head| will not be incorporated into a |declaration| (unless a comma or semicolon follows) but rather into a |function|. If the parameter specifications include identifiers (as in the case of function headings), the arguments look like declarations without the final semicolon; rule~37 (with aid of rule~33) constructs such parameter lists. Parameter specifications using abstract declarators (without identifiers) will be treated below. In |struct| declarations we may encounter bit-field specifications with or without an identifier; these are handled by rules 38~and~39 (the constant expression following the colon will later receive a spurious call from |make_underlined|, but in case of numeric constants this does no harm). @< Rules @>= {35, {{magic, int_like}}, {expression, "_$_"}}, @/ {36, {{expression, parameters}}, {function_head, "_B_"}}, @/ {37, {{lpar, int_like, expression, rpar}}, {parameters, "_+++_~!_---_"}},@/ {38, {{int_like, expression, colon}}, {int_like, "_~!_m_"}}, @/ {39, {{int_like, colon}}, {int_like, "_m_"}}, @[@] @ Abstract declarators are used after type specifiers in casts and for specifying the argument types in declarations of functions or function pointers. They are like ordinary declarators, except that the defined identifier has been ``abstracted''; an example is `|**(* )(int)|' in `|void g(char**(* )(int))@;|', which tells that |g| takes as argument a pointer to a function with |int| parameter yielding a pointer to pointer to |char|. A difficulty with abstract declarators is that they are built up around the vacuum left by abstracting the identifier, and since for more than one reason we cannot allow rules with empty left hand side, we have to find an alternative way to get them started. The natural solution to this problem is to look for sequences that can only occur if an identifier has been abstracted from between them, for instance `\.{*)}' (in categories: |unorbinop| |rpar|). The most compelling reason why in |C_read| we had to laboriously change the category of a |type_defined| identifier to |expression| instead of |int_like| inside its defining typedef declaration, is that it allows us to ensure that any remaining |int_like| scrap that is followed by a |subscript| is a sure sign of an abstract declarator. Here are the cases that start off abstract declarators (these are the first examples of rules that need context categories in their left hand side). As a visual hint to the reader we leave a little bit of white space on the spot where the identifier has vanished. Rules 40~and~41 handle declarators for pointer arguments, where the vanished identifier is preceded by an asterisk, which either stands at the end of the declarator, or is parenthesised (for function pointer arguments). In these rules there is no need to prefix the asterisk with `\.{\\mathord}', since the right context makes an interpretation as binary operator impossible. Rules 42~and~43 treat declarators for arrays, possibly of pointers; there are no corresponding rules with |parameters| instead of |subscript| since abstract declarators never specify functions themselves, only function pointers. In fact the ``function analogue'' of rule~43 would incorrectly match a cast following an operator like `|*|' or `|-|'. Rule~44 treats an abstract declarator consisting of subscripts only, which are redundantly parenthesised; here too the corresponding pattern with |parameters| is not only never needed, it would also spuriously trigger on parenthesised expressions that start with a cast. @< Rules @>= {40, {{unorbinop, rpar}, -1}, {declarator, "_,"}}, @/ {41, {{unorbinop, comma},-1}, {declarator, "_,"}}, @/ {42, {{int_like, subscript},1}, {declarator, ",_"}}, @/ {43, {{unorbinop, subscript},1}, {declarator, ",_"}}, @/ {44, {{lpar, subscript},1}, {declarator, ",_"}}, @[@] @~ Abstract declarators may grow just like ordinary declarators, to include prefixed asterisks, as well as postfixed subscripts and parameters, and grouping parentheses. @< Rules @>= {45, {{unorbinop, declarator}}, {declarator, "o__"}}, @/ {46, {{declarator, subscript}}, {declarator, NULL}}, @/ {47, {{declarator, parameters}}, {declarator, NULL}}, @/ {48, {{lpar, declarator, rpar}}, {declarator, NULL}}, @[@] @~ Here is how abstract declarators are assembled into |parameters|, keeping in mind that the ``abstract declarator'' might be completely empty (i.e., absent) as in `|void f(int);|' (rules 51~and~53). We put no space after the type specifier here, since it is followed either by an abstract declarator, a right parenthesis or comma, so certainly not by an identifier; therefore a space is neither necessary, nor would it improve readability. The \caps{ANSI/ISO} syntax allows empty parentheses as a parameter specification in abstract declarators, although this is an old-style form; rule~54 has been included to handle this case. Fortunately a parenthesised list of identifiers (which would parse as |expression|) is not allowed as parameter specification. @< Rules @>= {50, {{lpar, int_like, declarator, comma}}, {lpar, "____p5"}}, @/ {51, {{lpar, int_like, comma}}, {lpar, "___p5"}}, @/ {52, {{lpar, int_like, declarator, rpar}}, {parameters, NULL}}, @/ {53, {{lpar, int_like, rpar}}, {parameters, NULL}}, @/ {54, {{lpar, rpar}}, {parameters, "_,_"}}, @[@] @ {\it Structure, union, and enumeration specifiers}. It is permissible to use typedef identifiers as structure, union, or enumeration tags as well, so we include cases where an |int_like| follows a |struct_like| token. In \Cpp, we may also find things like `\&{private}:' in a class specifier; these are parsed just like `|default:|', i.e., as a |label| (rule~66). @< Rules @>= {60, {{struct_like, lbrace}}, {struct_head, "_ft_"},standard_braces}, @/ {60, {{struct_like, lbrace}}, {struct_head, "_~_"},unaligned_braces}, @/ {60, {{struct_like, lbrace}}, {struct_head, "_f_"},wide_braces}, @/ {61, {{struct_like, expression, lbrace}}, {struct_head, "_~!_ft_"},standard_braces}, @/ {61, {{struct_like, expression, lbrace}}, {struct_head, "_~!_~_"},unaligned_braces}, @/ {61, {{struct_like, expression, lbrace}}, {struct_head, "_~!_f_"},wide_braces}, @/ {62, {{struct_like, int_like, lbrace}}, {struct_head, "_~!$_ft_"},standard_braces|no_plus_plus}, @/ {62, {{struct_like, int_like, lbrace}}, {struct_head, "_~!_ft_"},standard_braces|only_plus_plus}, @/ {62, {{struct_like, int_like, lbrace}}, {struct_head, "_~!$_~_"},unaligned_braces|no_plus_plus}, @/ {62, {{struct_like, int_like, lbrace}}, {struct_head, "_~!_~_"},unaligned_braces|only_plus_plus}, @/ {62, {{struct_like, int_like, lbrace}}, {struct_head, "_~!$_f_"},wide_braces|no_plus_plus}, @/ {62, {{struct_like, int_like, lbrace}}, {struct_head, "_~!_f_"},wide_braces|only_plus_plus}, @/ {63, {{struct_like, expression}}, {int_like, "_~_"}}, @/ {64, {{struct_like, int_like}}, {int_like, "_~$_"},no_plus_plus}, @/ {64, {{struct_like, int_like}}, {int_like, "_~_"},only_plus_plus}, @/ {65, {{struct_head, declaration, rbrace}}, {int_like, "_+_-f_"},standard_braces}, @/ {65, {{struct_head, declaration, rbrace}}, {int_like, "_+f_-f_"},unaligned_braces & wide_braces}, @/ {66, {{label, declaration}}, {declaration, "b_f_"},only_plus_plus}, @[@] @ Rules 67--70 are for enumerations; they avoid forced line breaks and call |make_underlined| for all the enumeration constants. @< Rules @>= {67, {{struct_like, lbrace, expression},-1}, {struct_head, "_B_"}}, @/ {68, {{struct_like, expression, lbrace, expression},-1}, {struct_head, "_~_B_"}}, @/ {69, {{struct_head, expression, comma, expression},1}, {expression, "__B!_"}}, @/ {70, {{struct_head, expression, rbrace}}, {int_like, "_~+!_-B_"}}, @[@] @ The following rules are added to allow short structure and union specifiers to be kept on one line without having to repeatedly specify \:+. The idea is to place \:; after the left brace; this will cause the rules below to be invoked instead of those above, which avoids introducing forced line breaks. @< Rules @>= {71, {{struct_like, lbrace, magic}}, {short_struct_head, "_B__+"}}, @/ {72, {{struct_like, expression, lbrace, magic}}, {short_struct_head, "_~!_B__+"}}, @/ {73, {{struct_like, int_like, lbrace, magic}}, {short_struct_head, "_~!$_B__+"}, no_plus_plus}, @/ {73, {{struct_like, int_like, lbrace, magic}}, {short_struct_head, "_~!_B__+"}, only_plus_plus}, @/ {74, {{short_struct_head, declaration}}, {short_struct_head, "_B_"}}, @/ {75, {{short_struct_head, rbrace}}, {int_like, "_-B_"}}, @[@] @ {\it Statements}. Rule~80 gives the usual way statements are formed, while rule~81 handles the anomalous case of an empty statement. Its use can always be avoided by using an empty pair of braces instead, which much more visibly indicates the absence of a statement (e.g., an empty loop body); when the empty statement is used however, it will either be preceded by a space or start a new line (like any other statement), so there is always some distinction between a |while| loop with empty body and the |while| that ends a |do|~statement. A rule like this with left hand side of length~1 makes the corresponding category (viz.~|semi|) ``unstable'', and can only be useful for categories that usually are scooped up (mostly from the left) by a longer rule. Rules 82--84 make labels (ordinary, case and default), and rules 85~and~86 attach the labels to statements. Rule~87 makes \:; behave like an invisible semicolon when it does not match any of the rules designed for it, for instance if it follows an expression. @< Rules @>= {80, {{expression, semi}}, {statement, NULL}}, @/ {81, {{semi}}, {statement, NULL}}, @/ {82, {{expression, colon}}, {label, "!_h_"}}, @/ {83, {{case_like, expression, colon}}, {label, "_ _h_"}}, @/ {84, {{case_like, colon}}, {label, "_h_"}}, @/ {85, {{label, label}}, {label, "_B_"}}, @/ {86, {{label, statement}}, {statement, "b_B_"},not_all_stats_forced}, @/ {86, {{label, statement}}, {statement, "b_f_"},all_stats_forced}, @/ {87, {{magic}}, {semi, NULL}}, @[@] @ The following rules format compound statements and aggregate initialisers. Rules 90--94 combine declarations and statements within compound statements. A newline is forced between declarations by rule~90, unless the declarations are local (preceded by a left brace) and `\.{+m}' was specified (rule~91); this rule does not apply to structure specifiers, because the left brace will already have been captured in a |struct_head| before the rule can match. If `\.{+f}'~or~`\.{+a}' was specified, then a newline is forced between statements as well (rule~93). Between the declarations and statements some extra white space appears in ordinary \Cee\ (rule~92), but not in \Cpp, where declarations and statements may be arbitrarily mixed (rule~94). Rules 95--97 then build compound statements, where the last case is the unusual one where a compound statement ends with a declaration; empty compound statements are made into simple statements so that they will look better when used in a loop statement or after a label. If compound statements are not engulfed by a conditional or loop statement (see below) then they decay to ordinary statements by rule~98. Rules 99~and~100 reduce aggregate initialiser expressions, where the reduction of comma-separated lists of expressions is already handled by the expression syntax. @< Rules @>= {90, {{declaration, declaration}}, {declaration, "_f_"}}, @/ {91, {{lbrace, declaration, declaration},1}, {declaration, "_B_"},merged_decls}, @/ {92, {{declaration, statement}}, {statement, "_F_"},no_plus_plus}, @/ {92, {{declaration, statement}}, {statement, "_f_"},only_plus_plus}, @/ {93, {{statement, statement}}, {statement, "_f_"},forced_statements}, @/ {93, {{statement, statement}}, {statement, "_B_"},no_forced_statements},@/ {94, {{statement, declaration}}, {declaration, "_f_"},only_plus_plus}, @/ {95, {{lbrace, rbrace}}, {statement, "_,_"}}, @/ {96, {{lbrace, statement, rbrace}}, {compound_statement, "ft_+_-f_"},standard_braces}, @/ {96, {{lbrace, statement, rbrace}}, {compound_statement, "_+f_-f_"},unaligned_braces}, @/ {96, {{lbrace, statement, rbrace}}, {compound_statement, "f_+f_-f_"},wide_braces}, @/ {97, {{lbrace, declaration, rbrace}}, {compound_statement, "ft_+_-f_"},standard_braces}, @/ {97, {{lbrace, declaration, rbrace}}, {compound_statement, "_+f_-f_"},unaligned_braces}, @/ {97, {{lbrace, declaration, rbrace}}, {compound_statement, "f_+f_-f_"},wide_braces}, @/ {98, {{compound_statement}}, {statement, "f_f"}}, @/ {99, {{lbrace, expression, comma, rbrace}}, {expression, "_,__,_"}},@/ {100, {{lbrace, expression, rbrace}}, {expression, "_,_,_"}}, @[@] @ Like for structure and union specifiers, we allow compound statements to be kept on one line by inserting \:; after the left brace. Such statements will reduce to |statement| rather that to |compound_statement|, so that they will be treated as if they were simple statements. @< Rules @>= {101, {{lbrace, magic}}, {short_lbrace, "__+"}}, @/ {102, {{short_lbrace, declaration}}, {short_lbrace, "_B_"}}, @/ {103, {{short_lbrace, statement}}, {short_lbrace, "_B_"}}, @/ {104, {{short_lbrace, rbrace}}, {statement, "_-B_"}}, @[@] @ {\it Selection, iteration and jump statements}. There are three intermediate categories involved in the recognition of conditional statements. The category |if_like| stands for `|if|' or an initial segment of a repeated if-clause, up to and including `|else|~|if|'. An |if_head| is an |if_like| followed by its (parenthesised) condition (rules 110~and~111). If the statement following the condition is followed by `|else|~|if|', the whole construct reduces to |if_like| (so that the indentation will not increase after the second condition, rules 112~and~113), otherwise, if only `|else|' follows, reduction is to an |if_else_head| (rules 114~and~115), and finally, if no |else| follows at all, we reduce with only the if-branch to |statement| (rules 116~and~117). The reduction rules for |if_else_head| differ from those for |if_head| in that it will not combine with an |else|, even if it is present; the formatting is identical to that of an |else|-less |if_head| (rules 118~and~119). (It might be tempting to replace rules 116~and~117 by a reduction from |if_head| to |if_else_head| to be applied if no matching `|else|' is found, but that would require some subtle measures to prevent this decay at times when the right context is insufficiently reduced to decide whether an `|else|' is present or not.) The formatting of the if and else branches depends on whether they are compound statements or some other kind of statement (possibly another conditional statement), and on the flags for statement forcing and brace alignment. @< Rules @>= {110, {{if_like, expression}}, {if_head, "f_~_"}}, @/ {111, {{lbrace,if_like,expression},1}, {if_head, "_~_"},standard_braces},@/ {112, {{if_head, compound_statement, else_like, if_like}}, {if_like, "__f_~_"},aligned_braces}, @/ {112, {{if_head, compound_statement, else_like, if_like}}, {if_like, "_~_~_~_"},unaligned_braces}, @/ {113, {{if_head, statement, else_like, if_like}}, {if_like, "_+B_-f_~_"},not_all_stats_forced}, @/ {113, {{if_head, statement, else_like, if_like}}, {if_like, "_+f_-f_~_"},all_stats_forced}, @/ {114, {{if_head, compound_statement, else_like}}, {if_else_head, "__f_"},aligned_braces}, @/ {114, {{if_head, compound_statement, else_like}}, {if_else_head, "_~_~_"},unaligned_braces}, @/ {115, {{if_head, statement, else_like}}, {if_else_head, "_+B_-f_"},not_all_stats_forced},@/ {115, {{if_head, statement, else_like}}, {if_else_head, "_+f_-f_"},all_stats_forced}, @/ {116, {{if_head, compound_statement}}, {statement, "__f"},aligned_braces}, @/ {116, {{if_head, compound_statement}}, {statement, "_~_f"},unaligned_braces}, @/ {117, {{if_head, statement}}, {statement, "_+B_-f"},not_all_stats_forced}, @/ {117, {{if_head, statement}}, {statement, "_+f_-f"},all_stats_forced}, @/ {118, {{if_else_head, compound_statement}}, {statement, "__f"},aligned_braces}, @/ {118, {{if_else_head, compound_statement}}, {statement, "_~_f"},unaligned_braces}, @/ {119, {{if_else_head, statement}}, {statement, "_+B_-f"},not_all_stats_forced}, @/ {119, {{if_else_head, statement}}, {statement, "_+f_-f"},all_stats_forced}, @[@] @ The following rules prevent forced line breaks from conditional statements that occur within a one-line compound statement. @< Rules @>= {120, {{short_lbrace, if_like, expression},1}, {if_head, "_~_"}}, @/ {121, {{short_lbrace, if_head, statement, else_like}}, {short_lbrace, "_B_B_B_"}}, @/ {122, {{short_lbrace, if_head, statement}}, {short_lbrace, "_B_B_"}}, @[@] @ Switch and loop statements make use of the syntax for conditionals by reducing to |if_else_head| which will take one further statement and indent it (rules 130~and~131). Recall that `|for|' and `|switch|' are both |while_like|; the parenthesised object following `|for|' looks like nothing we have seen before, however, so we need extra rules to come to terms with it (rules 132--134). Rule~132 is needed to avoid a line break when these are normally inserted between statements, and rule~134 is needed in case the third expression is empty. The |do|-|while| loops have to be treated separately. Because we want to distinguish the case of a |compound_statement| as loop body from other kinds of statements, we cannot wait until the |while| combines with the loop control condition to an |if_else_head|, since by then a |compound_statement| will have decayed to |statement|. Hence we pick up the unreduced `|while|' token and form a new category |do_head| (rules 135~and~136); in case of a compound statement the `|while|' will be on the same line as the closing brace. Rules 137~and~138 then combine this with the condition and the ridiculous mandatory semicolon at the end to form a |statement|. @< Rules @>= {130, {{while_like, expression}}, {if_else_head, "f_~_"}}, @/ {131, {{lbrace, while_like, expression},1}, {if_else_head, "_~_"},standard_braces}, @/ {132, {{lpar, statement, statement}, 1}, {statement, "_B_"}, forced_statements}, @/ {133, {{lpar, statement, expression, rpar}}, {expression, "__B__"}}, @/ {134, {{lpar, statement, rpar}}, {expression, NULL}}, @/ {135, {{do_like, compound_statement, while_like}}, {do_head, "__~_"},standard_braces}, @/ {135, {{do_like, compound_statement, while_like}}, {do_head, "_~_~_"},unaligned_braces}, @/ {135, {{do_like, compound_statement, while_like}}, {do_head, "__f_"},wide_braces}, @/ {136, {{do_like, statement, while_like}}, {do_head, "_+B_-B_"},not_all_stats_forced}, @/ {136, {{do_like, statement, while_like}}, {do_head, "_+f_-f_"},all_stats_forced}, @/ {137, {{do_head, expression, semi}}, {statement, "f_~__f"}}, @/ {138, {{lbrace, do_head, expression, semi},1}, {statement, "_~__f"}}, @[@] @ The following rules prevent forced line breaks from loop statements that occur within a one-line compound statement. Since no special layout is required between the heading of a |while| loop and its body, rule~139 incorporates the heading as if it were a separate statement. For a |do|-|while| loop we must take a bit more effort to get the spacing following the |while| correct. @< Rules @>= {139, {{short_lbrace, while_like, expression}}, {short_lbrace, "_B_~_"}}, @/ {140, {{short_lbrace, do_like, statement, while_like},1}, {do_head, "_B_B_"}}, @/ {141, {{short_lbrace, do_head, expression, semi}}, {short_lbrace, "_B_~__"}}, @[@] @ The tokens `|goto|', `|continue|', `|break|', and `|return|' are all |return_like|; although what may follow them is not the same in all cases, the following two rules cover all legal uses. Note that rule~146 does not wait for a semicolon to come along; this may lead to a premature match as in `|return a+b;|', but this does not affect formatting, while the rule allows saying things like `|return home|' in a module name (or elsewhere) without risking irreducible scraps. @< Rules @>= {145, {{return_like, semi}}, {statement, NULL}}, @/ {146, {{return_like, expression}}, {expression, "_~_"}}, @[@] @ {\it Function definitions and external declarations}. Apart from the initial specification of the result type (which is optional, defaulting to |int|), a new-style function heading will parse as an |function_head| (see the declaration syntax above), while an old-style function heading is an |expression| possibly followed by a |declaration| (specifying the function parameters). Rules 150--152 parse these two kinds of function headings together with the function body, yielding category |function|; rule~153 attaches the optional result type specifier. Although the \Cee~syntax requires that the function body is a compound statement, we allow it to be a |statement| (to which |compound_statement| will decay), for in case a very short function body is specified using `\.{\{@@;}'. At the outer level declarations and functions can be mixed; when they do a bit of white space surrounds the functions (rules 154--156). The combination of several declarations is already taken care of by the syntax for compound statements; no extra white space is involved there. Rules 157--159 take care of function declarations that are not definitions (i.e., there is no function body); if followed by a semicolon, a comma or a right parenthesis, the |function_head| decays to an |expression|, and the rest of the syntax will take care of recognising a |declaration| or |parameters|. Rules 153~and~157 will be replaced in~\Cpp, for reasons explained below (incidentally, this is the reason the category |function_head| was introduced; it used to be simply |expression|). @< Rules @>= {150, {{function_head, statement}}, {function, "!_f_"}}, @/ {151, {{expression, statement}}, {function, "!_f_"}}, @/ {152, {{expression, declaration, statement}}, {function, "!_++f_--f_"}}, @/ {153, {{int_like, function}}, {function, "_ _"}}, @/ {154, {{declaration, function}}, {function, "_F_"}}, @/ {155, {{function, declaration}}, {declaration, "_F_"}}, @/ {156, {{function, function}}, {function, "_F_"}}, @/ {157, {{function_head, semi},-1}, {expression, NULL},no_plus_plus}, @/ {158, {{function_head, comma},-1}, {expression, NULL}}, @/ {159, {{function_head, rpar},-1}, {expression, NULL}}, @[@] @ {\it Module names}. Although module names nearly always stand for statements, they can be made to stand for a declaration by appending \:;, or for an expression by appending `\.{@@;@@;}'. The latter possibility is most likely to be useful if the module stands for (part of) an initialiser list. A module name can also be made into an expression by enclosing it in \:[ and~\:], but in that case rule~160 will apply first, placing a forced break after the module name. Rules 161, 164,~and~165 prevent a module name from generating forced breaks if it occurs on a one-line compound statement or structure or union specifier, while rules 167~and~168 serve to prevent rules 163~and~164 from matching with priority over rule~166. The rules given here will be replaced by other ones in compatibility mode. @< Rules @>= {160, {{mod_scrap}}, {statement, "_f"},cwebx}, @/ {161, {{short_lbrace, mod_scrap},1}, {statement, NULL},cwebx}, @/ {162, {{mod_scrap, magic}}, {declaration, "f__f"},cwebx}, @/ {163, {{lbrace, mod_scrap, magic},1}, {declaration, "__f"},cwebx|standard_braces}, @/ {164, {{short_lbrace, mod_scrap, magic},1}, {declaration, NULL},cwebx}, @/ {165, {{short_struct_head, mod_scrap, magic},1}, {declaration,NULL},cwebx}, @/ {166, {{mod_scrap, magic, magic}}, {expression, NULL},cwebx}, @/ {167, {{lbrace, mod_scrap, magic, magic},1}, {expression, NULL},cwebx|standard_braces}, @/ {168, {{short_lbrace, mod_scrap, magic, magic},1}, {expression, NULL},cwebx}, @[@] @ {\it Additional rules for compatibilty mode}. @^Levy/Knuth \.{CWEB}@> Although our grammar differs completely from the one used in \LKC., we use most of it also in compatibility mode (the exception is formed by the rules concerning module names). We do add a few rules in compatibility mode, mostly do deal with circumstances that are different for some reason or other. We start with module names, which behave in a completely different way. In compatibility mode, as in \LKC., a module name normally stands for an expression (rule~164) and in practice is almost always followed by a visible or invisible (|magic|) semicolon. Rules 160~and~161 treat these cases explicitly, in order to insert a forced break after the semicolon; rule~161 for the case of an invisible semicolon is needed because if we would wait for the |magic| semicolon to decay to an ordinary one, it might instead combine with an |int_like| token following it. Rules 162~and~163 are provided to allow the short form of compound statements even in compatibility mode (even though it is not present in \LKC.): they preempt rules 160~and~161, avoiding the forced break. Since in compatibility mode one has no means of indicating that a module name stands for a set of declarations, we add rule~165 to allow them nevertheless to be used before a function definition. Rules 170~and~171 compensate for the fact that compound assignment operators like `|+=|' are scanned as two tokens in compatibility mode (see section@#truly stupid@> for an explanation why this is done). Rule~172 allows types to be used in the argument lists of macros, without enclosing them between \:[~and~\:], in compatibility mode; this is done frequently in the Stanford GraphBase. @^Stanford GraphBase@> It is sufficient to remove expressions from the beginning of the argument list, since types, and more generally types followed by declarators, are already removed by the standard rules for |parameters|. As a result the argument list will either reduce to an |expression| or to |parameters|, depending on whether the final item was an expression. In both cases it will combine with the macro name to an |expression|, although the spacing will be a bit too wide in the |parameters| case. But then, one ought to use \:[~and~\:] anyway, which avoids this problem. @< Rules @>= {160, {{mod_scrap, semi}}, {statement, "__f"},compatibility}, @/ {161, {{mod_scrap, magic}}, {statement, "__f"},compatibility}, @/ {162, {{short_lbrace, mod_scrap, semi},1}, {statement, NULL},compatibility}, @/ {163, {{short_lbrace, mod_scrap, magic},1}, {statement, NULL},compatibility}, @/ {164, {{mod_scrap}}, {expression, NULL},compatibility}, @/ {165, {{statement, function}}, {function, "_F_"},compatibility}, @/ @) {170, {{binop, binop}}, {binop,"r__"},compatibility}, @/ {171, {{unorbinop, binop}}, {binop,"r__"},compatibility}, @/ {172, {{lpar, expression, comma}}, {lpar, "___p1"}, compatibility}, @[@] @[@] @ {\it Additional rules for \Cpp}. Up to this point we have included some specific rules for \Cpp, in places where a slight deviation from the \Cee~syntax was required. There are however a large number of syntactic possibilities of \Cpp\ that are not even remotely similar to those of~\Cee, so it is most convenient to collect them in a separate section. The author of \.{CWEBx} wishes to make it clear that he is quite aware of the incompleteness of the set of rules specified below, and that he assumes no responsibility for correcting this. One reason for this is that he has no readable formal grammar of \Cpp, which possibly could be used for validation (nor does he use \Cpp\ himself), another is that the pieces of grammar that he has seen show so little coherence that he seriously doubts whether it is possible at all to parse \Cpp\ reliably with a grammar of the type implemented here. In fact, the rules here were merely added in an attempt to cope with problems reported by users. We start with rules for `\&{operator}', which are simple: it should combine with a following operator symbol of any type to form an expression (rules 180--182). Then rules 183--186 take care of the `::'~operator: either a class name or nothing is expected at the left, and either an ordinary or class identifier at the right; the resulting category is that of the right hand side. Type identifiers may appear as the left hand side of an assignment within a list of formal parameters, indicating a default argument; in this case the while assignment should behave as a type identifier (rule~187). Next we give rules catering with constructor declarations in class definitions. First of all we must recognise the fact that the class name is being used as a function name here; the simplest solution is to recognise the combination of an |int_like| followed by a (possibly empty) parameter list (rules 190~and~191). We cannot let a |function_head| (possibly created by the rules kust mentioned) decay to an |expression| when followed be a semicolon, as we do for~\Cee, since declarations of constructor members of a class lack an initial type specification, so the |expression| would fail to become part of a |declaration|. Therefore, special measures are necessary: the simplest solution is to simply absorb (rule~192) any preceding type specifier into the |function_head| (thereby removing the distinction between its presence or absence), and construct de |declaration| explicitly from the |function_head| and the following semicolon (rule~193). @< Rules @>= {180, {{case_like, binop}}, {expression, "_o_"},only_plus_plus}, @/ {181, {{case_like, unorbinop}}, {expression, "_o_"},only_plus_plus}, @/ {182, {{case_like, unop}}, {expression, NULL},only_plus_plus}, @/ {183, {{int_like, colcol, expression}}, {expression, NULL},only_plus_plus}, @/ {184, {{colcol, expression}}, {expression, "o__"},only_plus_plus}, @/ {185, {{int_like, colcol, int_like}}, {int_like, NULL},only_plus_plus}, @/ {186, {{colcol, int_like}}, {int_like, "o__"},only_plus_plus}, @/ {187, {{int_like, binop, expression}}, {int_like, NULL},only_plus_plus}, @/ {190, {{int_like, parameters}}, {function_head, "_B_"},only_plus_plus}, @/ {191, {{int_like, lpar,rpar}}, {function_head, "_B_,_"},only_plus_plus},@/ {192, {{int_like, function_head}}, {function_head, "_ _"},only_plus_plus}, @/ {193, {{function_head, semi}}, {declaration, "!__"},only_plus_plus}, @[@]