@q file: fsm_tbls.w@> @q% Copyright Dave Bone 1998 - 2015@> @q% /*@> @q% This Source Code Form is subject to the terms of the Mozilla Public@> @q% License, v. 2.0. If a copy of the MPL was not distributed with this@> @q% file, You can obtain one at http://mozilla.org/MPL/2.0/.@> @q% */@> @** Finite automaton table definitions and their functions. These definitions support Yacco2's generated finite state automaton tables. A binary search is used on all tables: |Shift_tbl|, |Reduce_tbl|, and |State_s_thread_tbl|. Their structure contains the prefix giving the number of elements in the table, and the first record in the array. The elements are a concatenation of `in ascending sequence' sorted records for the binary search. @*2 State structure.\fbreak This represents the finite automaton state. The only wrinkles to your normal finite state definition are the entries supporting parallelism and the 2 meta terminals for the `all shift' and `invisible shift' functions. These extra shifts act like a normal shift requiring their own shift entries. Parallelism is the \PARshift grammatical expressions within the state calling threads. Each expression supplies the thread and the returned terminal be it successful or an error terminal. An aborted thread returns nothing. The expression itself requires 2 shifts: the \PARshift followed by the winning terminal that the arbitrator has selected. Why is there not 3 shifts to include the thread used? I originally thought of this but it has no relevance to the expression parsed. The thread call is a pre-conditional condition to the T stream. If all the threads have aborted, then the \PARshift terminal must be removed from the parse stack before trying the standard finite automaton's operations. The list of threads associated with the state needing launching completes the declaration of parallelism. |proc_call_shift__| has been added to deal with chained procedure calls. What the heck is that? It is a dispatcher of procedure calls reacting to the returned T. This grammatical structure allows one to call a thread, react on the returned T by calling a specific procedure. For example, this subrule \PARshift ``lhs'' |TH_id| |Rdispatch_lhs|. The thread ``id'' is a identifier / symbol table lookup for keywords on a character token stream. The following |Rdispatch_lhs| becomes the dispatcher of called procedures based on the returned T ``first set'' is ``lhs''. |Rdispatch_lhs| subrule would be \PROCshift ``lhs-phrase'' |PROC_TH_lhs_phrase| receiving the ``lhs'' start T. Its other subrules would be programmed to catch the errors. This ``procedure call'' sublety requires the called procedure to use the stacked returned T ``lhs'' as its current T and not the current T of the caller. Also it must set its own token position to 1 less the caller's current token position. There is an overlap on the input token stream whereby the characters used to create the ``lhs'' T are still in the supplier's token stream and not ``lhs''. The other subtelty is a non-chained procedure call when the calling parser has only 1 thread to call so call it as a procedure and not as a thread to juice the optimization process. |questionable_shift__| is used in questionable situations like error detection points within a grammar. See notes to myself for an explanation. @+= struct State{ yacco2::UINT state_no__; yacco2::Shift_entry* parallel_shift__; // \PARshift yacco2::Shift_entry* all_shift__; // \ALLshift yacco2::Shift_entry* inv_shift__; // \INVshift yacco2::Shift_entry* proc_call_shift__; // \TRAshift yacco2::Shift_tbl* shift_tbl_ptr__; yacco2::Reduce_tbl* reduce_tbl_ptr__; yacco2::State_s_thread_tbl* state_s_thread_tbl__; yacco2::Type_pc_fnct_ptr proc_call_addr__; // function for \TRAshift yacco2::Shift_entry* questionable_shift__; // \QUEshift };@/ @*2 Shift table lookup.\fbreak The |Shift_tbl| is a binary array of |Shift_entry| of the finite state. The shift operation goes through a sequential list of ranked terminals trying always to shift first before trying to reduce. The ranking of potential shifts are:\fbreak \ptindent{1) current terminal being parsed} \ptindent{2) questionable shift terminal \QUEshift} \ptindent{3) invisible meta terminal \INVshift} \ptindent{4) all shift terminal \ALLshift} Their presence in the state's configuration dictates the shift operation. There are 4 individual search attempts to see whether the shift operation should take place. The numbered points indicates their ranking order: point 2 and 3 should be mutually exclusive. The |goto__| in the |shift_entry| is your vanilla flavoured fsa `go to' state. The actual state definition is laced with extra information to support parallel and conditional parsing. \INVshift is a bailout mechanism from ambiguous gramatical contexts. It can be used to describe an epsilon rule. How? Though there is a shift happening, there is no consumption of the token stream. Its use depends on the palative tastes of the grammar writer or the ingredients demanded by the grammar. @+= struct Shift_entry { yacco2::USINT id__; yacco2::State* goto__; };@/ struct Shift_tbl { yacco2::USINT no_entries__; yacco2::Shift_entry first_entry__[1]; };@/ @*2 Reduce table entry.\fbreak The |Reduce_entry| gives the lookahead set number to be checked. The |rhs_id__| gives the subrule identity that will collapse to its left-handside rule. Where is the binary compare function? It is the set compare function. See |Set handling|. @+= struct Reduce_entry { yacco2::Set_tbl* la_set__; yacco2::USINT rhs_id__; };@/ struct Reduce_tbl { yacco2::USINT no_entries__; yacco2::Reduce_entry first_entry__[1]; };@/ @** Threading Definitions.\fbreak Lots of merit but if it's not fast then this idea is side-lined or in football terms benched. To optimize the dispatching of threads, a global approach is required. This is resolved by Yacco2's linker. Why is a global approach needed? Sequential first set evaluation per thread within the state's configuation is just tooooo slowww. To properly assess the first sets of all threads, the linker must read the ``fsc'' files generated per thread by Yacco2. The linker can now apply the transience operator on the first sets where a thread can call another thread in its first set: the start (closure) state of the grammar could contain a call to a thread.\fbreak \fbreak Thought:\fbreak How many stacked focuses does one need with fad out to see the forest from the trees? Programming demands this talent of Yoga reflection but how many times have u consciously observed oneself observing oneself... In this case, the tree scope lost to the forest, as the local optimizations discussed in |Notes to myself| had reached their effectiveness and I still needed more improvement. \fbreak Thought no 2:\fbreak Why wasn't this global approach thought of before now? Well I tried to get my threading ideas to work first. Thoughts of efficiency were not my first priority. Now reality of slowness demands gettting it to work faster. The speed approach is test the current token's enumeration id against a global ``thread list having T in their first set'' when paralellism is present within the finite automaton's current state's configuration. If there are threads with this first set item, then go thru the state's potential thread list looking to launch them. On an aside, common prefix threads will showup together in their common terminals. There should not be too many of these so the list should be short --- normally one thread. To get speed, a thread id is required. It is the enumeration of all the thread grammars. This enumeration is done within Yacco2's Linker. As Yacco2 is local to the grammar being compiled, its local table must use indirection to get at this thread id. So u will see pointers to items that only get resolved by the language linker. See |@| for the global symbols referenced within this library but generated by Yacco2's Linker. \fbreak \fbreak Mutexes controlling the hoards:\fbreak \ptindent{1) |yacco2::TOKEN_MU| - token dispensor access} \ptindent{2) |yacco2::TRACE_MU| - used to log tracing} \ptindent{3) |yacco2::TH_TBL_MU| - access thread dispatch table} \ptindent{4) |yacco2::SYM_TBL_MU| - symbol table access} With my dual core AMD Sun work station, readonly access to the token dispensor requires a mutex |TOKEN_MU| to prevent thread residues poluting other threads accessing ``at the same time'' their tokens. My tracings re-affirmed my intuitions as to why it was not working in this configuration. Past portings onto Apple's OSx, VMS Alpha, and NT Windows all worked. In a single chip environment execution is normally sequential but in multi-chip environments parallel execution streams are dancing together on the same stage. |TOKEN_MU| ensures that each fetch to the token supplier is atomicly completed before others requests are serviced. Unfortunately this has a potential braking effect by throatling back to 1 only thread executing if there are multiple simultaneously token read requests happening until i can explore who / what causes the downstream polution. Currently my library is staticly declared and not declared as shared.??? Remember as multiple threads are launched by a parser, each thread's execution path is asynchronous in their token fetches even though each launched competing thread starts at the same position within the token stream. Please see ``Notes to myself'' on eliminating the ``jit'' token fetch. |TRACE_MU| mutex ensures that the complete text traced is completely outputed. The atomicity is bracketed by the acquire / release cycle of the |TRACE_MU| mutex. This prevents interleaving of parallel thread loggings to occur. For example, i/o calls are fielded by the operating system; it is the operating system's decision as to who will run next. |SYM_TBL_MU| is reserved for possible parallel symbol table access. |TH_TBL_MU| is the bouncer of the global thread table that registers launched threads. These thoroughbreds keep their engines running with environmental friendly octane while waiting for their next serve request that provides the needed pep to parallel parsing. As each access to the table is read / write, |TH_TBL_MU| keeps this critical region in tip-top shape. The following section discusses in detail how this table is used. \fbreak @*2 Critical region discussion surrounding |Parallel_thread_table|.\fbreak |Parallel_thread_table| raison d'\^etre is speed. Depending on the parsed context, threads are created dynamically. This stable of threads are reused on demand that eliminates the create-run-destroy cycle of a thread. Now it's create once, run as many times as needed, and exit when finished parsing. Nested thread calls like recursion is supported: thread A calls thread B calls thread A. Each thread in the list keeps an availability status: busy or idle. There are 2 parts to the global thread table:\fbreak \ptindent{1) |Parallel_thread_table| --- the array of thread lists} \ptindent{2) |TH_TBL_MU| mutex --- the guard dog controlling the crowds} \fbreak \convertMPtoPDF{"/usr/local/yacco2/diagrams/threading_defs.1"}{1}{1} \fbreak \fbreak The above figure depicts the thread table generated from \O2 linker. The 2 contexts requiring reader / writer access are:\fbreak \ptindent{1) grammar's launching or requesting threads to run} \ptindent{2) launched thread setting its work status back to idle or exiting} As an optimization, threads receive an unique ordered id from \O2 linker. This is just a lexigraphical ordering on their names allowing table access by subscript. The thread table is a single writer controlled by mutex primitives |@| and |@|. These |cweb| sections are calls to the thread manager using the |TH_TBL_MU| mutex. To acquire control a launching grammar uses the |@| primitive. If someone else has possession on the resource, the thread manager places the requestor into a hold queue until the resource is freed. It is the thread manager that dispenses execution control. \fbreak \fbreak Thread table possession:\fbreak Quick review: \fbreak A grammar's finite automata can contain lists of threads for the running within each state's context. To juice the running, each thread has a first set of tokens that start its parse. Potential thread launch evaluation uses the current token against these first sets to determine what threads should run.\fbreak \fbreak So possession is 9/10$^{th}$ of whose law? Now launch or run those threads by calling the thread manager --- the ``how'' will be described later. New threads add their |worker_thread_blk|* to the thread table without any care for critical region hygiene. The |Parser| object of the newly launched grammar does it from its |constructor|. Cuz the launching grammar has possession of the thread table and the launched threads are unique, there is no potential reader / writer destructive scribbling to the table. A thread's work status is maintained in the table depending on how they get run. ``Just created'' threads do a |push_back| of their |worker_thread_blk|* into the thread list while ``already created'' threads set their |worker_thread_blk|'s status to busy that is already registered in the thread table's list. A grammar's potential thread list does not contain multiple requests of the same thread so that u'll never get a parallel set of identical threads spoiling the broth within the same launch list. Remember the table's granularity is by thread id subscript: So there is no conflict.\fbreak \fbreak Note:\fbreak If the thread manager flips execution to a launched thread (single or multiple cpus don't matter) and this newly executed thread requires thread table access, it must call the |@| that puts its request on hold until the resource is freed up. Eventually the original grammar releases control of the thread table by |@| that activates execution of the requestor. \fbreak \fbreak Sleeping beauty:\fbreak Finally the calling grammar places itself into a wait state (is it ripper van winkle?) to be wakened by one of its called dwarfs. This is done by calling the |@| that releases the grammar's mutex, puts it on ice, and places its conditional variable into the thread manager's event wait queue. Freeing up of these ``thread manager'' variables allows its called threads to play with its calling grammar's critical region and to eventually wake it up. Remember, each called thread must go thru the acquiring / releasing of the called thread's mutex. U wouldn't want the dwarfs to screwup ogre's critcal region and the grammar writer's ire. Why the playing with the calling grammar's critical region away? Its called threads can report back their parse findings thru the ``acceptance token'' queue of the sleeping beauty. To wake up the ogre, the last thread finished executing calls primitive |@|. How is this determined? The calling grammar's critical region has a launched thread count. Each called thread decrements it when completed regardless of its parsing outcome. When it hits zero, this indicates last thread to finish and so jostle the snoring beauty. The last duty of a running thread is |@|, set its run status to idle ,|@|, and place itself into a wait state for another round of drinks: |@|.\fbreak \fbreak \fbreak How does a called thread know its requestor?\fbreak Let's review the 2 situations:\fbreak \ptindent{1) create a thread} \ptindent{2) call an already created thread} There are 2 doors of entry into a thread. ``Creation of a thread'' is at the mercy of the thread manager to register the thread and prepare it for the calling. The only way information can be passed to the to-be created thread is thru a parameter passed to the called thread procedure by the thread manager. The calling grammar's |Parser| object address is passed as a parameter to |CREATE_THREAD| who passes it to the to-be-executed thread. Built within the thread code is the casting and extraction of the requestor's |Parser| object. Once the called thread is finished running, it puts itself into a wait state for its next marching order.\fbreak \fbreak The 2nd port of entry.\fbreak U guessed it, the thread list contains the thread's |Parser| object that has been freed of its mutex and conditional variable put on ice. So the 2nd entry point is the |@|. The calling grammar calls |SIGNAL_COND_VAR| to wake up the dwarf while the called thread uses the |@| to wake up the ogre that really calls |SIGNAL_COND_VAR|. Within the critical region of the ``to be requested'' thread is |pp_requesting_parallelism__| that holds the calling grammar's deposited critical region address. Note: thru out a parse a thread can be activated by different suitors. Each deposit by the requesting grammar leaves its tale for the dwarf.\fbreak \fbreak Draining the thread swamp:\fbreak How does one get out of this infinite loop of wait for its marching order, do the parse, and wait again. This is Sambo and the tigers twirl: tail chasing ain't it? There is another marching order to exit-work. A bit of a subtlety here needs explaining: how does one know if the thread manager has placed all the toe tapping threads into a wait status within a single cpu environment? To let the swamp drain, a |@| takes place that could be not effective but i'm trying: better yet would be to have a |pthread| procedure to do the act of bleeding... followed by a ``stop work'' order --- it has other euphemisms. This is how the thread breaks out of its tail spin. The global |Parallel_threads_shutdown| procedure initiates the above and details the threads run stats and shutdown attempts. It is usually called from the ``mainline'' code of the program. \fbreak @*3 Diagrams, do we have diagrams --- examples of critical region activity.\fbreak Let a figure detail a 1000 words. In a single cpu environment, a process's execution sequence is sequential. To depict this using G as the process, A and B threads, and the critical region resources, i will use a box within a box concept to simulate multi-dimensions. Why a box? In one of the following examples there are 3 outer space dimensions representing G, A, and B. This really is a triangle but the running comments and activity vectors makes it easier to annotate using a box. An obelisk with its point removed represents all the dimensions. Going from the outer to the inner parts of the obelisk, the outer walls are the process / thread spaces. Next, time rulers are the motes between outer and inner spaces. The court yard is the inner space (resource space). It contains the critical regions' resources, and execution queues --- running and waiting to run. Commented outer space events are registered aginst its time mark by vectors using an arrowhead to indicate the activity's direction into or out of the resource space. A double headed vector indicates the outer space call to the inner space that returns execution back to the calling outer space. To indicate ownership and duration of time, each resource uses a line similar to the math open / close interval. The ``running queue'' also ties together the start/stop boundaries with a dashed vertical line to show continuity. Other resources have the owner above their time line marker. A dotted vertical vectored to the resource marks a request for ownership that is pending. Its converse uses a dashed line away from the resource marking the acquisition from a pending a request. \fbreak \fbreak \fbreak Example of threads being run by \O2. \fbreak \fbreak \convertMPtoPDF{"/usr/local/yacco2/diagrams/threading_exs.1"}{1}{1} \fbreak \fbreak A single thread A that gets launched and reports back to its caller G . The resource ``x'' is the global guard to the global thread table. Basic comments on the critical region components of G have been left out due to space. As previously described, an active thread count is maintained along with the acceptance token queue that the called threads deposit their results for G's arbitration code assessment. Lines 18 and 23 comments these situations with the bracketed acquisition. Line 18 shows the called thread A reporting its results within G's protected area. The signal variables of G and A have also been ommitted due to space. In the above example, it would not have mattered whether the launched thread started executing immediately with the calling grammar put on hold as the launching grammar G still has ownership on ``x'' that eventually the A will require and so it would be put into a pending state until the ``x'' resource could be re-allocated. In this illustration, G goes into a wait state to be signalled later by A. If the interweave of G's execution sequence was such that A was working and signaled G before G put itself into a wait-on-signal state, it is the thread library that pends the signal for when G finally requests it. \fbreak \fbreak A Deadlock Example:\fbreak \fbreak \convertMPtoPDF{"/usr/local/yacco2/diagrams/threading_exs.2"}{1}{1} \fbreak Some comments:\fbreak Deadlock is a graph of cyclicity. In the example, resource ``x'' is an intermediary used by the thread manager to relinquish execution control held by A when it releases ``x''. Process G then continues by creating thread B with its Acquire events on ``b'' and attempts on ``a''. Eventually thread A attempts its acquiring of ``b''. By sequencing the Acquire requests --- Acquire(a) by A, Acquire(b) by B, Acquire(a) by B, and Acquire(b) by A, a cyclicity check could be done per Acquire to determine whether deadlock is met. The third Acquire(a) by B has the potential deadlock cyclic condition established. Because A is still running, it is not a conclusive deadlock as thread A could Release(a) to free up the cycle created by B. Only when thread A asks for ``b'' does it become a solid deadlock regardless of process G being able to run. The simplest run death is G requesting a wait-on-signal when there are no other threads running that could wake it up --- Sleeping beauty with no Prince to do resusitation. @*2 Thread entry.\fbreak Just your basic attributes describing a thread. Each thread block is generated by the Linker. Remember, the thread ids are in lexigraphical order: upper / lower case are different. Only the Linker has access to all the threads to produce this order. Each thread entry block will have the Linker's manufactured thread name which will be referenced by the state's thread table and the global stable of threads. The thread entry will be identified by the following rule:\fbreak \ptindent{concatenate the letter ``I'' to the thread's name} For example, ``|TH_eol|'' is the end-of-line detector thread. Its variable name would be ``|ITH_eol|'' where the |TH_eol| value is taken from the grammar's ``parallel-thread-function'' component. The reason for the |thread_array_record| having an array of |Thread_entry*| is due to the thread entry name. It is referenced by the |State_s_thread_tbl| and can be referenced by the grammar writer when using the |spawn_thread_manually| procedure. The thread entry names are generated by Yacco2's Linker that is outside of Yacco2's library jurisdiction but used by it. This generation is specific per language being generated. @+= struct Thread_entry { yacco2::KCHARP thread_fnct_name__; yacco2::Type_pp_fnct_ptr thread_fnct_ptr__; yacco2::USINT thd_id__; yacco2::Type_pc_fnct_ptr proc_thread_fnct_ptr__; }; @*2 Thread stable. @+= struct thread_array_record{ yacco2::USINT no_entries__; yacco2::Thread_entry* first_entry__[1]; }; @*2 State's thread table.\fbreak The thread entries are in sorted order. How? Though the list of potential threads order within the grammar are as programmed by the grammar writer, their names will be sorted lexigraphically. Hence their order in the table are relatively sorted. The thread entry variable and its contents are generated by Yacco2's Linker. @+= struct State_s_thread_tbl { yacco2::USINT no_entries__; yacco2::Type_pp_fnct_ptr ar_fnct_ptr__; yacco2::ULINT (*thd_id_bit_map__); yacco2::Thread_entry* first_entry__[1]; };@/ @*2 Threads having terminal in first set.\fbreak Well here's the turbo charger of threads. It is generated by Yacco2's Linker. As the number of terminals defined is unknown to this general library, a spoofing technique is used. Have a pointer to a structure that defines the running grammar's environment that contains another indirection to the local information. I use T as a generic symbol representing the individual terminals within the grammar's Terminal vocabulary. These 2 structures are:\fbreak \ptindent{1) terminal array pointing to the threads with T in the grammar's first set} \ptindent{2) the thread id list having T in their first set} This spoofing technique is: @+= struct thd_ids_having_T { yacco2::ULINT first_thd_id__[1]; };@/ struct T_array_having_thd_ids { yacco2::USINT no_of_T__; yacco2::thd_ids_having_T* first_entry__[1]; };@/