@* The character set. One of the main goals in the design of \.{WEB} has been to make it readily portable between a wide variety of computers. Yet \.{WEB} by its very nature must use a greater variety of characters than most computer programs deal with, and character encoding is one of the areas in which existing machines differ most widely from each other. To resolve this problem, all input to \.{WEAVE} and \.{TANGLE} is converted to an internal seven-bit code that is essentially standard ASCII, the ``American Standard Code for Information Interchange.'' The conversion is done immediately when each character is read in. Conversely, characters are converted from ASCII to the user's external representation just before they are output. Such an internal code can be accessed by users of \.{WEB} by means of constructions like \.{@@'A'}, which should be distinguished from \.{'A'}. The former is transformed by \.{TANGLE} into an integer that is the internal code of \.A, but the latter, a |char| constant, is not touched by \.{WEB}, and will be interpreted by the \cee\ complier according to the machine's character set. (Actually, of course, it gets translated into \.{WEB}'s internal code just like any other character in the input file, but then it gets translated back at output time.) @^ASCII code@> Here is a table of the standard visible ASCII codes (\.{ } stands for a blank space): $$\def\:{\char\count255\global\advance\count255 by 1} \count255='40 \vbox{ \hbox{\hbox to 40pt{\it\hfill0\/\hfill}% \hbox to 40pt{\it\hfill1\/\hfill}% \hbox to 40pt{\it\hfill2\/\hfill}% \hbox to 40pt{\it\hfill3\/\hfill}% \hbox to 40pt{\it\hfill4\/\hfill}% \hbox to 40pt{\it\hfill5\/\hfill}% \hbox to 40pt{\it\hfill6\/\hfill}% \hbox to 40pt{\it\hfill7\/\hfill}} \vskip 4pt \hrule \def\^{\vrule height 10.5pt depth 4.5pt} \halign{\hbox to 0pt{\hskip -24pt\O{#0}\hfill}&\^ \hbox to 40pt{\tt\hfill#\hfill\^}& &\hbox to 40pt{\tt\hfill#\hfill\^}\cr 04&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 05&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 06&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 07&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 10&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 11&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 12&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 13&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 14&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 15&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 16&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 17&\:&\:&\:&\:&\:&\:&\:\cr} \hrule width 280pt}$$ We introduce new types to distinguish between the transliterated characters and the characters in the outside world. Let all ``interesting'' values that a |char| variable may take lie between |first_text_char| and |last_text_char|; for the ASCII code we can take |first_text_char=0| and |last_text_char=0177|. We will tell \.{WEB} to convert all input characters in this range to its own code, and balk at characters outside the range. We make two assumptions: |first_text_char>=0| and |char| has room for at least eight bits. @^system dependencies@> @d first_text_char = 0 /* lowest interesting value of a |char| */ @d last_text_char = 0177 /* highest interesting value of a |char| */ @= typedef char ASCII; /* type of characters inside \.{WEB} */ typedef char outer_char; /* type of characters outside \.{WEB} */ @ The \.{WEAVE} and \.{TANGLE} processors convert between ASCII code and the user's external character set by means of arrays |xord| and |xchr| that are analogous to PASCAL's |ord| and |chr| functions. @= ASCII xord[last_text_char]; /* specifies conversion of input characters */ outer_char xchr[0200]; /* specifies conversion of output characters */ @ Every system supporting \cee\ must be able to read and write the 95 visible characters of standard ASCII above (although not necessarily using the ASCII codes to represent them). Conversely, these characters, plus the newline, are sufficient to write any \cee\ program. Other characters are desirable mainly in strings, and they can be referred to by means of escape sequences like \.{'\t'}. The basic implementation of \.{WEB}, then, only has to assign an |xord| to these 95 characters (newlines are swallowed by the reading routines). The easiest way to do this is to assign the characters to their positions in |xchr| and then invert the correspondence: @c common_init() { strcpy(xchr," !\"#$%&'()*+,-./0123456789\ :;<=>?@@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~ "); @; @; } @ The following system-independent code makes the |xord| array contain a suitable inverse to the information in |xchr|. @= { int i; /* to invert the correspondence */ for (i=first_text_char; i<=last_text_char; i++) xord[i]='\040'; for (i=1; i<0177; i++) xord[xchr[i]]=i; } @ Some \cee\ compilers accept an extended character set, so that one can type things like \.^^Z\ instead of \.{!=}. If that's the case in your system, you should change this module, assigning positions |01| to |037| in the most convenient way; for example, at MIT you can just say $$\hbox{|for (i=1; i<=037; i++) xchr[i]=i;|}$$ since \.{WEB}'s character set is essentially identical to MIT's, even with respect to characters less than |040| (see the definitions below). If, however, the changes do not conform with these definitions you should change the definitions as well. @^system dependencies@> @^notes to myself@> @= /* nothing needs to be done */ @ @d text_char = char /* the data type of characters in text files */ @= typedef char ascii_code; /* ascii codes from 0 to 127 */ typedef FILE *text_file; @ One of the \ASCII{} codes below 040 has been given a symbolic name in \.{TIE} because it is used with a special meaning. @d tab_mark = '\t' /* \ASCII{} code used as tab-skip */ @ When we initialize the |xord| array and the remaining parts of |xchr|, it will be convenient to make use of an index variable, |i|.