@q file: fsm_tbls.w@>
@q%   Copyright Dave Bone 1998 - 2015@>
@q% /*@>
@q%    This Source Code Form is subject to the terms of the Mozilla Public@>
@q%    License, v. 2.0. If a copy of the MPL was not distributed with this@>
@q%    file, You can obtain one at http://mozilla.org/MPL/2.0/.@>
@q% */@>
@** Finite automaton table definitions and their functions.
These definitions support Yacco2's generated finite state automaton tables.
A binary search is used on all tables: |Shift_tbl|, |Reduce_tbl|, and |State_s_thread_tbl|.
Their structure contains the prefix giving the number of elements in the table, 
and the first record in the array.
The elements are a concatenation of `in ascending sequence' sorted records for the binary search.

@*2 State structure.\fbreak
 This represents the finite automaton state.
The only wrinkles to your normal finite state definition are the 
entries supporting parallelism and the 2 meta terminals for the `all shift' and
`invisible shift' functions.
These extra shifts  act like a normal shift requiring their own shift entries.

Parallelism is the \PARshift grammatical expressions within the state
calling threads.
Each expression supplies the thread and the returned terminal be it successful or
an error terminal. An aborted thread returns nothing.
The expression itself requires 2 shifts:
the \PARshift followed by the winning terminal that the arbitrator
has selected. 
Why is there not 3 shifts to include the thread used?
I originally thought of this but it has no relevance to the expression parsed.
The thread call is a pre-conditional condition to the T stream.
If all the threads have aborted, then the \PARshift terminal must be removed
from the parse stack before trying the standard finite automaton's operations.
The list of threads 
associated with the state needing launching completes the declaration of parallelism. 

|proc_call_shift__| has been added to deal with chained procedure calls.
What the heck is that? It is a dispatcher of procedure calls reacting
to the returned T.
This grammatical structure allows one to call a thread, react on the returned T
by calling a specific procedure.
For example, this subrule \PARshift ``lhs'' |TH_id| |Rdispatch_lhs|.
The thread ``id'' is a identifier / symbol table lookup for 
keywords on a character token stream. 
The following |Rdispatch_lhs| becomes the dispatcher of called procedures
based on the returned T  ``first set'' is ``lhs''.
|Rdispatch_lhs| subrule would be \PROCshift ``lhs-phrase'' |PROC_TH_lhs_phrase| 
receiving the ``lhs'' start T.
Its other subrules would be programmed to catch the errors.
This ``procedure call'' sublety requires the called procedure 
to use the stacked returned T ``lhs'' as
its current T and not the current T of the caller. Also it must 
set its own token position to 1 less the caller's current token position.
There is an overlap on the input token stream whereby the characters used
to create the ``lhs'' T are still in the supplier's token stream and not ``lhs''.

The other subtelty is a non-chained procedure call when the calling parser
has only 1 thread to call so call it as a procedure and not as a thread
to juice the optimization process.

|questionable_shift__| is used in questionable situations like error
detection points within a grammar. See notes to myself for an explanation.
@<Struct...@>+=
struct State{
  yacco2::UINT          state_no__;
  yacco2::Shift_entry*  parallel_shift__;  // \PARshift
  yacco2::Shift_entry*  all_shift__;       // \ALLshift
  yacco2::Shift_entry*  inv_shift__;       // \INVshift
  yacco2::Shift_entry*  proc_call_shift__; // \TRAshift
  yacco2::Shift_tbl*    shift_tbl_ptr__;
  yacco2::Reduce_tbl*    reduce_tbl_ptr__;
  yacco2::State_s_thread_tbl* state_s_thread_tbl__;
  yacco2::Type_pc_fnct_ptr  proc_call_addr__; // function for \TRAshift
  yacco2::Shift_entry*  questionable_shift__; // \QUEshift
};@/

@*2 Shift table lookup.\fbreak
The |Shift_tbl| is a binary array of |Shift_entry| of the finite state.
The shift operation goes through a sequential list of ranked terminals trying
always to shift first before trying to reduce.
The ranking of potential shifts are:\fbreak
\ptindent{1) current terminal being parsed}
\ptindent{2) questionable shift terminal \QUEshift}
\ptindent{3) invisible meta terminal \INVshift}
\ptindent{4) all shift terminal \ALLshift}

Their presence in the state's configuration dictates the shift operation.
There are 4 individual search attempts to see
 whether the shift operation should take place.
The numbered points indicates their 
ranking order: point 2 and 3 should be mutually exclusive.

The |goto__| in the |shift_entry| is your vanilla flavoured 
fsa `go to' state.
The actual state definition is laced with extra information
to support parallel and conditional parsing. 
\INVshift is a bailout mechanism from ambiguous gramatical contexts. 
It  can be used to describe an epsilon rule. How? 
Though there is a shift happening, there is no consumption of the token stream.
Its use depends on the palative tastes of the grammar writer or the 
ingredients demanded by the grammar.
@<Struct...@>+=
struct Shift_entry {
  yacco2::USINT  id__;
  yacco2::State* goto__;
};@/
struct Shift_tbl {
  yacco2::USINT          no_entries__;
  yacco2::Shift_entry  first_entry__[1];
};@/

@*2 Reduce table entry.\fbreak
The |Reduce_entry| gives the lookahead set number to be checked.
The  |rhs_id__| gives the subrule identity that will collapse
to its left-handside rule.
Where is the binary compare function?
It is the set compare function. See |Set handling|.
@<Struct...@>+=
struct Reduce_entry {
  yacco2::Set_tbl*    la_set__;
  yacco2::USINT       rhs_id__;
};@/
struct Reduce_tbl {
  yacco2::USINT           no_entries__;
  yacco2::Reduce_entry   first_entry__[1];
};@/


@** Threading Definitions.\fbreak
Lots of merit but if it's not fast then this idea is side-lined or in football terms benched.
To optimize the dispatching of threads, a global approach is required. 
This is resolved by Yacco2's linker. 
Why is a global approach needed? Sequential first set evaluation per thread
within the state's configuation is just tooooo slowww.
To properly assess the first sets of all threads, the linker must read
the ``fsc'' files generated per thread by Yacco2.
The linker can now apply the transience operator on the first sets where
a thread can call another thread in its first set: the start (closure) state of the grammar 
could contain
a call to a thread.\fbreak 
\fbreak
Thought:\fbreak
How many stacked focuses does one need with fad out to see the forest from the trees?
Programming demands this talent of Yoga reflection but how many times
have u consciously
observed oneself observing oneself...
In this case, the tree scope lost to the forest,
as the local optimizations discussed in |Notes to myself| 
had reached their effectiveness and I still needed more
improvement. 
\fbreak
Thought no 2:\fbreak
Why wasn't this global approach thought of before now?
Well I tried to get my threading ideas to work first.
Thoughts of efficiency were not my first priority. 
Now reality of slowness demands gettting it to work faster.
The speed approach is test the current token's enumeration id 
against a global ``thread list having T in their first set'' when paralellism is present 
within the  finite automaton's current state's configuration.
If there are threads with this first set item, then go thru the state's potential
thread list looking to launch them.
On an aside, common prefix threads will showup together in their common terminals.
There should not be too many of these 
so the list should be short --- normally one thread.
To get speed, a thread id is required. It is the enumeration of all the thread grammars.
This enumeration is done within Yacco2's Linker. As Yacco2 is local to the
grammar being compiled, its local table must use indirection to get at this thread id.
So u will see pointers to items that only get resolved by the language linker.
See |@<Global external variables from yacco2's linker@>| for the global symbols referenced 
within this library but generated by Yacco2's Linker.
\fbreak
\fbreak
Mutexes controlling the hoards:\fbreak
\ptindent{1) |yacco2::TOKEN_MU| - token dispensor access}
\ptindent{2) |yacco2::TRACE_MU| - used to log tracing}
\ptindent{3) |yacco2::TH_TBL_MU| - access thread dispatch table}
\ptindent{4) |yacco2::SYM_TBL_MU| - symbol table access}
With my dual core AMD Sun work station, readonly access to the token dispensor
requires a mutex |TOKEN_MU| to prevent thread residues poluting other threads accessing 
``at the same time'' their tokens.
My tracings re-affirmed my intuitions as to why it was not working in this configuration. 
Past portings onto Apple's OSx, VMS Alpha, and NT Windows all worked.
In a single chip environment execution is normally sequential but in multi-chip 
environments parallel execution streams are dancing together on the same stage. 
|TOKEN_MU| ensures that each fetch to the token supplier is atomicly completed
before others requests are serviced. Unfortunately this has a 
potential braking effect
by throatling back to 1 only thread executing if there are multiple simultaneously token 
read requests happening
 until i can
explore who / what causes the downstream polution.
Currently my library is staticly declared and not declared as shared.???
Remember as multiple threads are launched by a parser, each  thread's
execution path is asynchronous in their token fetches even
though each launched competing thread starts at the same position within the token stream. 
Please see ``Notes to myself'' on eliminating the ``jit'' token fetch.

|TRACE_MU| mutex ensures that the complete text traced is completely outputed.
The atomicity is bracketed by the acquire / release cycle of the |TRACE_MU| mutex.
This prevents interleaving of parallel thread loggings to occur.
For example, i/o calls are fielded by the operating system; it is the operating system's
decision as to who will run next.

|SYM_TBL_MU| is reserved for possible parallel symbol table access.
|TH_TBL_MU| is the bouncer of the global thread table that registers launched threads. 
These thoroughbreds keep their engines running with environmental friendly 
octane while waiting for their next serve request
that provides the needed pep to parallel parsing. 
As each access to the table is read / write, |TH_TBL_MU| keeps this
critical region in tip-top shape.  
The following section discusses in detail how this table is used.
\fbreak
@*2 Critical region discussion surrounding |Parallel_thread_table|.\fbreak
|Parallel_thread_table| raison d'\^etre is speed.
Depending on the parsed context, threads are created dynamically.
 This stable of threads are 
reused on demand that eliminates the create-run-destroy cycle of a thread.
Now it's create once, run as many times as needed, and exit when finished parsing.
Nested thread calls like recursion is supported: thread A calls thread B calls thread A.
Each thread in the list keeps an availability status: busy or idle. 
There are 2 parts to the global thread table:\fbreak
\ptindent{1) |Parallel_thread_table| --- the array of thread lists}
\ptindent{2) |TH_TBL_MU| mutex --- the guard dog controlling the crowds}
\fbreak
\convertMPtoPDF{"/usr/local/yacco2/diagrams/threading_defs.1"}{1}{1}
\fbreak
\fbreak
The above figure depicts the thread table generated from \O2 linker.
The 2 contexts requiring reader / writer access are:\fbreak
\ptindent{1) grammar's launching or requesting threads to run}
\ptindent{2) launched thread setting its work status back to idle or exiting}
As an optimization, threads receive an unique ordered id from \O2 linker.
This is just a lexigraphical ordering on their names
 allowing table access by subscript.
The thread table is a single writer controlled by mutex primitives 
|@<acquire global thread table critical region@>| 
and |@<release global thread table critical region@>|.
These |cweb| sections are calls to the thread manager
using the |TH_TBL_MU| mutex.  
To acquire control a launching grammar 
uses the |@<acquire global thread table critical region@>| primitive.
If someone else has possession on the resource, 
the thread manager places the requestor into a hold queue
until the resource is freed.
It is the thread manager that  dispenses  execution control.
\fbreak
\fbreak
Thread table possession:\fbreak 
Quick review: \fbreak
A grammar's finite automata can
contain lists of threads for the running within each state's context.
To juice the running, each thread has a first set of tokens that start its parse.
Potential thread launch evaluation uses the current token against these
first sets to
determine what threads should run.\fbreak 
\fbreak 
So possession is 9/10$^{th}$ of whose law?
Now launch or run those threads by calling the thread manager ---
the ``how'' will be described later. 
New threads add their |worker_thread_blk|*
to the thread table without any care for critical region hygiene.
The |Parser| object of the newly launched grammar does it from its |constructor|.
Cuz the launching grammar has possession of the thread table
 and the launched threads are unique, there
is no potential reader / writer destructive scribbling to the table.
A thread's work status is maintained in the table depending on how they get run.
``Just created'' threads do a |push_back| of their |worker_thread_blk|* into the
 thread list while ``already created'' threads set their |worker_thread_blk|'s
status to busy that is already registered in the thread table's list.
A grammar's potential thread list 
does not contain multiple requests of the same thread so that u'll never
get a parallel set of identical threads spoiling the broth within the same launch list.
Remember the table's granularity is by thread id subscript: 
So there is no conflict.\fbreak
\fbreak
Note:\fbreak
If the thread manager flips execution to a launched thread 
(single or multiple cpus don't matter) and this newly executed thread
 requires thread table access, 
 it must call the |@<acquire global thread table critical region@>|
that puts its request on hold until the resource is freed up. 
Eventually the original grammar releases control of the thread table
by |@<release global thread table critical region@>|
that activates  execution of the requestor.
\fbreak
\fbreak
Sleeping beauty:\fbreak
Finally the calling grammar places itself
into a wait state (is it ripper van winkle?)
to be wakened  by one of its
called dwarfs.
This is done by calling the |@<wait for event to arrive with no loop@>| 
that releases the grammar's mutex,
puts it on ice,
and places its conditional variable into the thread manager's event wait queue.
Freeing up of these ``thread manager'' variables allows its called
threads to play with its calling grammar's critical region 
and to eventually wake it up. 
Remember, each called thread must
go thru the acquiring / releasing of the called thread's mutex.
U wouldn't want the dwarfs to screwup ogre's critcal region and 
the grammar writer's ire. 
Why the playing with the calling grammar's critical region away?
Its called threads can
report back their parse findings thru the ``acceptance token'' queue
of the sleeping beauty.
To wake up the ogre, the last thread finished executing
calls primitive |@<signal thread to wake up...@>|.
How is this determined?
The calling grammar's critical region has a launched thread count.
Each called thread decrements it when completed regardless of 
its parsing outcome.
When it hits zero, this indicates last thread to finish and so 
jostle the snoring beauty.
The last duty of a running thread is 
|@<acquire global thread table critical region@>|,
set its run status to idle 
,|@<release global thread table critical region@>|,
and place itself into a wait state for another round of drinks:
|@<wait for event to arrive with no loop@>|.\fbreak
\fbreak
\fbreak
How does a called thread know its requestor?\fbreak
Let's review the 2 situations:\fbreak
\ptindent{1) create a thread}
\ptindent{2) call an already created thread}
There are 2 doors of entry into a thread.
``Creation of a thread'' is at the mercy of the thread manager to
register the thread and prepare it for the calling.
The only way information can be passed to the to-be created thread is thru a 
parameter passed to the called thread procedure by the thread manager.
The calling grammar's |Parser| object address is passed 
as a parameter to |CREATE_THREAD| 
who passes it to the to-be-executed thread.
Built within the thread code is the casting and extraction of 
the requestor's |Parser| object.
Once the called thread is finished running, it puts itself into a wait state 
for its next marching order.\fbreak
\fbreak
The 2nd port of entry.\fbreak 
U guessed it, the thread list contains the thread's |Parser| object that 
has been freed of its 
mutex and conditional variable  put on ice.
So the 2nd entry point is the |@<wait for event to arrive with no loop@>|.
The calling grammar calls |SIGNAL_COND_VAR| to wake up the dwarf while the
called thread uses the |@<signal thread to wake up and work@>| 
to wake up the ogre that really calls |SIGNAL_COND_VAR|.
Within the critical region of the ``to be requested'' thread
is |pp_requesting_parallelism__| that holds the calling grammar's 
deposited critical region address. 
Note: thru out a parse a thread can be activated 
 by different suitors.
Each deposit by the requesting grammar leaves its tale for the dwarf.\fbreak
\fbreak
Draining the thread swamp:\fbreak
How does one get out of this infinite loop of
 wait for its marching order, do the parse, and wait again.
This is Sambo and the tigers twirl: tail chasing ain't it?
There is another marching order to exit-work.
A bit of a subtlety here needs explaining:
how does one know if the thread manager has placed all the toe tapping threads into a 
wait status within a single cpu environment?
To let the swamp drain, a |@<pause for x seconds@>| takes place
that could be not effective but i'm trying: better yet would be to have
a |pthread| procedure to do the act of bleeding...
followed by
a ``stop work'' order --- it has other euphemisms.
This is how the thread breaks out of its tail spin.
The global |Parallel_threads_shutdown| procedure initiates the above
and details the threads run stats and shutdown attempts.
It is usually called from the ``mainline'' code of the program. \fbreak

@*3 Diagrams, do we have diagrams --- examples of critical region activity.\fbreak
Let a figure detail a 1000 words. In a single cpu environment, a process's
execution sequence is sequential.
To depict this using G as the process, A and B threads, and the critical region
resources, i will use a box within a box
concept to simulate multi-dimensions.
Why a box? In one of the following examples there are 
3 outer space dimensions representing G, A, and B.
This really is a triangle but the running comments and activity vectors makes it easier
to annotate using a box.
An obelisk with its point removed represents all the dimensions.

Going from the outer to the  inner parts of the obelisk,
the outer walls are the process / thread spaces.
Next, time rulers are the motes between outer and inner spaces.
The  court yard is the inner space (resource space).
It contains the critical regions' resources, 
and execution queues --- running and waiting to run.

Commented outer space events are registered aginst its time mark by
vectors using an arrowhead to indicate the activity's direction into or out
of the resource space. A double headed vector 
indicates the outer space call to the inner space that 
returns execution back to the calling outer space.

To indicate ownership and duration of time, each resource uses a line similar to
the math open / close interval.
The ``running queue'' also ties together the start/stop boundaries with a dashed 
vertical line to show continuity.
Other resources have the owner above their time line marker.
A dotted vertical vectored to the resource marks a request for ownership that is pending.
Its converse uses a dashed line away from the resource
marking the acquisition from a pending a request.
\fbreak
\fbreak
\fbreak
Example of threads being run by \O2.
\fbreak
\fbreak
\convertMPtoPDF{"/usr/local/yacco2/diagrams/threading_exs.1"}{1}{1}
\fbreak
\fbreak
A single thread A that gets launched and reports back to its caller G .
The resource ``x'' is the global guard to the global thread table.
Basic comments on the critical region components of G have been left out due to space.
As previously described, an active thread count is maintained along with 
the acceptance token queue
that the called threads deposit their results for G's arbitration code assessment.
Lines 18 and 23 comments these situations with the bracketed acquisition.
Line 18 shows the called thread A reporting its results within G's protected area.
The signal variables of G and A have also been ommitted due to space.
In the above example, it would not have mattered whether
the launched thread started executing immediately with the calling grammar
put on hold as
the launching grammar G still has ownership on ``x'' that
eventually the A will require and so it would be put into a pending state until
the ``x'' resource could be re-allocated.
In this illustration, G goes into a wait state to be signalled later by A.
If the  interweave of G's execution sequence was such that
A was working and signaled G before G put itself into a wait-on-signal state,
it is the thread library that pends the signal for when G finally requests it.


\fbreak
\fbreak
A Deadlock Example:\fbreak
\fbreak
\convertMPtoPDF{"/usr/local/yacco2/diagrams/threading_exs.2"}{1}{1}
\fbreak
Some comments:\fbreak
Deadlock is a graph of cyclicity. 
In the example, resource ``x'' is an intermediary used by the thread manager
to relinquish execution control held by A when it releases ``x''.
Process G then continues by creating thread B with its Acquire events on ``b'' 
and attempts on ``a''. Eventually thread A attempts its acquiring of ``b''.
By sequencing the Acquire requests --- Acquire(a) by A, Acquire(b) by B,
Acquire(a) by B, and Acquire(b) by A, a cyclicity check could be done
per Acquire to
determine whether deadlock is met.
The third Acquire(a) by B has the potential deadlock cyclic condition established.
Because A is still running, it is not a conclusive deadlock as thread A could
Release(a) to free up the cycle created by B.
Only when thread A asks for ``b'' does it become a solid deadlock
regardless of process G being able to run.

The simplest run death is G requesting a wait-on-signal when there are no other
threads running that could wake it up --- Sleeping beauty with no Prince to
do resusitation.


@*2 Thread entry.\fbreak
Just your basic attributes describing a thread.
Each thread block is generated by the Linker.
Remember, the thread ids are in lexigraphical order: upper / lower case are different.
 Only the Linker has access to all
the threads to produce this order.
Each thread entry block will have the Linker's manufactured thread name which
will be referenced by the state's thread table and the global stable of threads.
The thread entry will be identified by the following rule:\fbreak
\ptindent{concatenate the letter ``I'' to the thread's name} 
For example, ``|TH_eol|'' is the end-of-line detector thread. 
Its variable name would be ``|ITH_eol|'' where the |TH_eol| value is taken from the grammar's
``parallel-thread-function'' component.

The reason for the |thread_array_record| having an array of |Thread_entry*| is due to
 the thread entry name.
 It is referenced by the |State_s_thread_tbl|  and can be referenced by the grammar writer 
 when using the |spawn_thread_manually| procedure.
The thread entry names are generated by Yacco2's Linker that is outside of 
Yacco2's library jurisdiction
but used by it. This generation is specific per language being generated. 
@<Struct...@>+=
struct Thread_entry {
  yacco2::KCHARP           thread_fnct_name__;
  yacco2::Type_pp_fnct_ptr thread_fnct_ptr__;
  yacco2::USINT thd_id__;
  yacco2::Type_pc_fnct_ptr proc_thread_fnct_ptr__;
};

@*2 Thread stable.
@<Struct...@>+=
struct thread_array_record{
  yacco2::USINT no_entries__;
  yacco2::Thread_entry* first_entry__[1];
};
@*2 State's thread table.\fbreak
 The thread entries are in sorted order. How?
Though the list of potential threads order within the grammar
are as programmed by the grammar writer, their names 
will be sorted lexigraphically. Hence their order in the table are
relatively sorted.

The thread entry variable and its contents are generated by Yacco2's Linker.

@<Struct...@>+=
struct State_s_thread_tbl {
  yacco2::USINT              no_entries__;
  yacco2::Type_pp_fnct_ptr ar_fnct_ptr__;
  yacco2::ULINT (*thd_id_bit_map__);
  yacco2::Thread_entry*   first_entry__[1];
};@/

@*2 Threads having terminal in first set.\fbreak
Well here's the turbo charger of threads. It is generated by Yacco2's Linker.
As the number of terminals defined is unknown to this general library, 
a spoofing technique is used.

Have a pointer to a structure that defines the running grammar's environment that
contains another indirection to the local information. I use T as a generic
symbol representing the individual terminals within the grammar's
Terminal vocabulary.
These 2 structures are:\fbreak
\ptindent{1) terminal array pointing to the threads with T in the grammar's first set}
\ptindent{2) the thread id list having T in their first set}
This spoofing technique is:
@<Struct...@>+=
struct thd_ids_having_T {
  yacco2::ULINT   first_thd_id__[1];
};@/
struct T_array_having_thd_ids {
  yacco2::USINT   no_of_T__;
  yacco2::thd_ids_having_T*   first_entry__[1];
};@/