DNA.
wyrmcow
Version.
1.0.9
Namespace.
::wyrm::cow
Command.
::wyrm::cow
Language.
tcl
Manpage.
cow (1WY)
Testbase.
Test Script
Test Report
Import.
Package.
wyrmwif
wyrm-dpda
wyrm-expat
wyrm-uri
Export.
Implementation.
wyrmcow.tcl
Package.
wyrmcow.tcl

vW2 Compiler Compiler

Sections.
Parser-generator from vW2 grammars
vW2 Grammar
Grammar Properties
Extract One Level Grammar from Two Level Grammar
Attributes
DFA Parser Generation
LR(k) Parser Generation
Parser Automata
Lexical Processor
Symbol Table
Tree Rewriting
Implementations
Test Base
Make.
Package.
rule $lib/wyrmcow.tcl $source/wyrmcow.tcl "
  rm -f $lib/wyrmcow.tcl
  sed 's/@@VER@@/[control ver]/; /^#line/d' $source/wyrmcow.tcl >$lib/wyrmcow.tcl
"
   

Parser-generator from vW2 grammars

   
top

1 :: Generate parsers from vW2 grammars.

Copyright (C) 2003 SM Ryan

Wyrmwif Tcl extensions. For non-profit uses only, provided this copyright is preserved on all copies, this work may be freely copied, modified, redistributed, compiled, and incorporated in other works. This work is distributed with no warranty of any kind; no author or distributor accepts any responsibility for the consequences of using it, or for whether it serves any particular purpose or works at all, unless he or she says so in writing.

 
section    top

2 :: Does a cow have buddha nature?

MOOOOOOOOOOOOOOO.

 
section    top
3. cow (1WY) ::

NAME

cow — Parser-generator from vW2 grammar.

synopsis

wyrm::cow grammar...

description

Generated Code Commands

<Generate compiled parser code> <Generated command: accept_character> <Generated command: accept_production> <Generated command: allocate> <Generated command: anti_get_value> <Generated command: assign_integer> <Generated command: attribute_wam> <Generated command: call> <Generated command: call_domain> <Generated command: deallocate> <Generated command: decrement_integer> <Generated command: deferred_action> <Generated command: define_glyph> <Generated command: define_integer> <Generated command: define_terminal_class> <Generated command: discard_character> <Generated command: eval_loop> <Generated command: execute> <Generated command: explicit_end> <Generated command: fail> <Generated command: foreign_immediate> <Generated command: foreign_include> <Generated command: foreign_text> <Generated command: get_constant> <Generated command: get_list> <Generated command: get_structure> <Generated command: get_value> <Generated command: get_variable> <Generated command: goto_state> <Generated command: ibbuffer> <Generated command: ibclear> <Generated command: ibdiscard> <Generated command: if_bound> <Generated command: implicit_end> <Generated command: increment_integer> <Generated command: initialise_constant> <Generated command: initialise_memory> <Generated command: initialise_register> <Generated command: input_styles> <Generated command: iqreadahead> <Generated command: iq_maximum> <Generated command: name_class> <Generated command: nop> <Generated command: on_failure> <Generated command: on_success> <Generated command: parser> <Generated command: parse_domain> <Generated command: parse_error> <Generated command: parse_start> <Generated command: parse_state> <Generated command: partial_shift> <Generated command: pass> <Generated command: perfect_hash_entry> <Generated command: perfect_hash_modulus> <Generated command: perfect_hash_multiplier> <Generated command: proceed> <Generated command: psbegin> <Generated command: psclear> <Generated command: psend> <Generated command: pspush> <Generated command: ps_maximum> <Generated command: put_constant> <Generated command: put_list> <Generated command: put_structure> <Generated command: put_unsafe_value> <Generated command: put_value> <Generated command: put_variable> <Generated command: put_void> <Generated command: query_start> <Generated command: recognise_reservable> <Generated command: recognise_symbol> <Generated command: reduce_parse> <Generated command: report> <Generated command: requires> <Generated command: reserved_word_table> <Generated command: retry> <Generated command: retry_me_else> <Generated command: section> <Generated command: set_constant> <Generated command: set_local_value> <Generated command: set_value> <Generated command: set_variable> <Generated command: set_void> <Generated command: shift_lexeme> <Generated command: state_semantics> <Generated command: subquery_start> <Generated command: switch_on_constant> <Generated command: switch_on_structure> <Generated command: switch_on_term> <Generated command: symbol_enum> <Generated command: symbol_reserved> <Generated command: symbol_table_bottom_scope> <Generated command: symbol_table_class> <Generated command: symbol_table_constant> <Generated command: symbol_table_definition> <Generated command: symbol_table_empty_defs> <Generated command: symbol_table_identify> <Generated command: symbol_table_method> <Generated command: symbol_table_interference> <Generated command: symbol_table_new_definition> <Generated command: symbol_table_new_scope> <Generated command: symbol_table_new_table> <Generated command: symbol_table_not_constant> <Generated command: symbol_table_split_defs> <Generated command: term_production> <Generated command: term_vars> <Generated command: transition_character> <Generated command: transition_integer> <Generated command: transition_integer_switch> <Generated command: transition_iqis> <Generated command: transition_jump> <Generated command: translator_prelude> <Generated command: trim_environment> <Generated command: trust> <Generated command: trust_me> <Generated command: try> <Generated command: try_me_else> <Generated command: unify_constant> <Generated command: unify_local_value> <Generated command: unify_value> <Generated command: unify_variable> <Generated command: unify_void> <Generated command: wam_initialisation>
			
#	wyrm-cow.dna - Copyright (C) 2003 SM Ryan.  All rights reserved.

package provide wyrmcow @@VER@@

package require wyrmwif
package require wyrmassoc
package require wyrmexpat
package require wyrmsym
package require wyrmdpda
package require wyrmuri

namespace eval ::wyrm {
	namespace export cow

	proc cow args {
		if {[catch {cow::driver $args} rs]} {
			puts $rs\n\n$::errorInfo
			return "report C [list $rs]"
		} else {
			return $rs
		}
	}
	
	namespace eval cow {
		namespace import ::wyrm::*
		
		proc driver argv {
			variable gen
			parser $argv
			sym grammar [namespace current]::cfparser
			cfparser machine: := cf
			vW1
			attributeAnalysis hr
			<Symbol table class definitions>
			switch [hr type:] {
				2 {createLR(k) hr cfparser}
				3 {createDFA hr cfparser}
			}
			if {[hr compiled:]} {
				<Lexical processors>
			}
			<Report the vW1 grammar>
			if {[hr compiled:]} {
				implementParser cfparser
			}
			[hr implementor:]::implementparser $gen
		}
		<Generate compiled parser code>
		<vW2 grammar parser>
		<Grammar properties>
		<DFA parser generation>
		<LR(k) parser generation>
		<Parser automata>
		<Extract vw1 grammar from vw2>
		<Derive the attribute evaluation machine>
	}
	<cow implementations>
}

		
 
section    top
 
section    top
proc log—Log parser generator activities.
input args—Log message.
io namespace oldclicks—How long since the last message.
io namespace gen—Generated parser code.
			
variable oldclicks [clock clicks -milliseconds]
proc log {args} {
	variable gen
	variable oldclicks
	set newclicks [clock clicks -milliseconds]
	set wallclock [clock format [clock seconds] -format %H:%M:%S]
	set clicks [expr {$newclicks-$oldclicks}]
	if {$clicks<0} {set clicks [expr {~$clicks}]}
	set elapsed [format %d:%02d:%02d.%03d \
			[expr {$clicks/3600000}] \
			[expr {($clicks/60000)%60}] \
			[expr {($clicks/1000)%60}] \
			[expr {($clicks)%1000}] \
	]
	set message [join $args { }]
	set message $wallclock.$elapsed.$message
	set oldclicks $newclicks
puts $message
	append gen "#.$message\n"
}

		
 
section    top
proc report—Error and information message reporter.
input severity—C (critical), E (error), or I (information).
input args—Error message.
io namespace gen—Generated parser code.
			
proc report {severity args} {
	variable gen
	variable depth
	set message [join $args { }]
if {[string equal $severity E]} {
	puts "ERROR $message"
} else {
	puts $message
}
	<Generated command: report>
}

		
 
section    top
			
gen report $severity $message

		
 
section    top
proc gen—Generated code.
input args—Command name and arguments.
io namespace gen—Generated parser code.
io namespace depth—Generated parser code depth.
			
proc gen {args} {
	variable gen
	variable depth
	if {[llength $args]==1 && [llength [lindex $args 0]] > 1} {set args [lindex $args 0]}
	if {[string equal [lindex $args 0] END]} {
		incr depth -2
		append gen "[string repeat { } $depth]\}\n"
	} else {
		append gen [string repeat { } $depth]
		set begin [string equal [lindex $args 0] BEGIN]
		if {$begin} {set args [lrange $args 1 end]}
		set args [setp op $args]
		append gen $op
		foreach arg $args {append gen " [list $arg]"}
		if {$begin} {append gen " \{"; incr depth 2}
		append gen \n
	}
}

		
 
section    top

report I "vW1 Attribute Grammar"
sym each mn [mr::nonterminals] {
	report I "${mn}:: [string map {, {}} [mr $mn]]"
	catch {report I "  [mr properties: $mn]"}
}
sym each pn [hr::nonterminals] {
	report I "$pn:"
	catch {report I "  [hr properties: $pn]"}
	catch {report I "  shift: [hr action: $pn %shift]"} rs
	sym each alt [hr::if $pn := {}] {
		report I "    $alt"
		catch {
			foreach cs [hr cs: $pn $alt] {
				report I "      [displayAttribute $cs]"
			}
		} rs
		catch {
			report I "      reduce: [hr action: $pn $alt]"
		} rs
	}
	catch {report I "  ::= [hr foreign: $pn]"}
}
sym each sym [hr::terminals] {
	if {![string match *symbol $sym]} continue
	if {[catch {report I "$sym = [hr properties: $sym]"}]} {
		report I "$sym = terminal"
	}
	catch {report I "  shift: [hr action: $sym %shift]"} rs
	catch {report I "  characters: [hr character: $sym]"} rs
	catch {
		foreach cs [hr cs: $sym %shift] {
			report I "  [displayAttribute $cs]"
		}
	} rs
	catch {report I "  ::= [hr foreign: $sym]"}
}

		
 
section    top
			
report I "Lexical Attribute Grammar [symr::if name: := scanner] for [symr::if clients: := unknown]"
sym each pn [symr::nonterminals] {
	report I "$pn:"
	catch {report I "  [symr properties: $pn]"}
	catch {report I "  shift: [hr action: $pn %shift]"} rs
	sym each alt [symr::if $pn := {}] {
		report I "    $alt"
		catch {
			foreach cs [symr cs: $pn $alt] {
				report I "      [displayAttribute $cs]"
			}
		} rs
		catch {
			report I "      reduce: [hr action: $pn $alt]"
		} rs
	}
	catch {report I "  ::= [symr foreign: $pn]"}
}
sym each sym [symr::terminals] {
	if {![string match *symbol $sym]} continue
	if {[catch {report I "$sym = [symr properties: $sym]"}]} {
		report I "$sym = terminal"
	}
	catch {report I "  shift: [hr action: $sym %shift]"} rs
	catch {
		foreach cs [symr cs: $sym %shift] {
			report I "  [displayAttribute $cs]"
		}
	} rs
	catch {report I "  ::= [symr foreign: $sym]"}
}

		
   
   

vW2 Grammar

   
top
 
section    top

12 :: Grammar properties are

foreign: production-name
Foreign text alternatives: a union of four element strings: output variables, input variables, language, and foreign code.
nullable: production-name
Whether the production is nullable.
weight: production-name
Production weight (minimum nonempty number of terminal that can be generated.
action: production-name [ alternative ]
Reduce action.
action: production-name [ %shift ]
Shift action.
symbol-name.character
Glyph translations.
cs: production-name alternative
Context sensitive constraints (attributes).
properties: production-name
Description of production properties.
anyright:
If any productions are right (or embedding) recursive.
attributes:
Metanotions declared to be attributes.
compiled:
If a parser was constructed.
everywhere: symbol-name
Everywhere reserved symbols.
finite:
Set of productions with only a finite set of generated terminal.
glyph: glyph-name
Character with this name.
k:
Maximum lookahead.
maxexcept:
Maximum string length of all except...character.
name:
Grammar name.
order:
Produced from orderring.
predicates:
Predicate productions.
reserved: symbol
Reserved symbols in the lexicon.
signatures:
Declared hypernotion signatures.
start:
Start production.
type:
Grammar type in the Chomsky hierarchy, 2 or 3.
ep: production-name
Original production names for created productions. This will be used to create automata ep: properties.
class: symbol|character symbol-set := class-numbers
List each class with its symbols, indexed by its members.
class-number: symbol|character := last-class
Last class number for symbol and character classes.
members: symbol|character class-numbers := symbol-set
List each class with its symbols, indexed by its class numbers.
reserved: reserved-table-index := {spelling reserved-symbol...}
Each unique reserved symbol list.

 
section    top

proc truncate {string} {
	while 1 {
		set i [string first \n $string]
		if {$i==0} {
			set string [string range $string 1 end]
		} else {
			break
		}
	}
	if {$i>0} {incr i -1; set string [string range $string 0 $i]...}
	if {[string length $string]>40} {
		set string [string range $string 0 39]...
	}
	return $string
}

proc parser vw2grammar {
	sym grammar [namespace current]::hr
		hr k: := 2
		hr type: := 3
		hr name: := hyperrules
		hr start: := %start
		hr conflicts: := {}
		hr foreigns: := {}
		hr symtab: := {}
		hr implementor: := ::wyrm::model
	sym grammar [namespace current]::mr
		mr k: := 2
		mr type: := 3
		mr name: := metarules
	sym grammar [namespace current]::rew
	oav bundle [namespace current]::outp
	log Begin parse input.
	<Initialise glyph names before parsing the grammar>
	set grammarhtml {}
	array set header {1 "" 2 "" 3 "" 4 "" 5 "" 6 "" n 0}
	set chunknumber 0
	set ps {}
	set state rule
	set curr %start
	set first 1
	set input ""
	foreach chunk $vw2grammar {
		incr chunknumber
		set token [wyrm::expat::create]
		wyrm::expat::resume $token [
			exec $::env(SHELL) -c {tidy --indent 0 --output-xml 1 --quiet 1; exit 0} << $chunk 2> /dev/null
		]
		set ts 0
		foreach piece [wyrm::expat::end $token] {
			set what [assoc get $piece %what]
			vassoc put piece %chunk $chunknumber
			switch $what {
				start {
					set name [assoc get $piece %name]
					assoc get piece -exact est class c
					switch -glob [lindex $ts end]/$name {
						0/body {
							lappend ts body
						}
						0/* {
							;
						}
						*/cow* {
							lappend ts $name
						}
						{*/h[1-6]} {
							set i [string index $name end]
							set header($i) ""
							set headr(n) $i
							lappend grammarhtml $piece
							lappend ts $name
						}
						default {
							lappend grammarhtml $piece
							lappend ts $name
						}
					}
				}
				end {
					switch -glob [lindex $ts end] {
						0 {
							;
						}
						cow* {
							set ts [lrange $ts 0 end-1]
						}
						{*/h[1-6]} {
							set i [string index $name end]
							set header($i) ""
							set headr(n) [expr {$i-1}]
							lappend grammarhtml $piece
							set ts [lrange $ts 0 end-1]
						}
						default {
							lappend grammarhtml $piece
							set ts [lrange $ts 0 end-1]
						}
					}
				}
				content {
					set data [assoc get piece %data]
					switch -glob [lindex $ts end] {
						0 {
							;
						}
						{*/h[1-6]} {
							set i [string index $name end]
							append header($i) $data
							lappend grammarhtml $piece
						}
						code - cowhn - cowsep {
							setp {ps state input} [dpda $ps $state $input[assoc get piece %data] {
								foreign-chunk
									{((?~code=.*)),((?~N=.*)),<foreigncode>(?~ch=[^][]+)}
									{($code$ch),($N),<foreigncode>}
								skip-newline
									{<(?~state=.*)>[\n\r]}
									{<[incr line; set state]>}
								skip-blanks
									{<(?~state=.*)>[ \t]+}
									{<$state>}
								skip-comments
									{<(?~state=.*)>{[^{}]*}}
									{<$state>}
								end-of-input
									{<rule>$}
									{<quit>}
								start-a-rule
									{<rule>}
									{(rulekind),<notion>}
								
								<Tree rewriting rules>
								<Metarules>
								<Foreign rules>
								<Hyperrules>
								<Property rules>

								unknown-text
									{(.*),(#NS),<rulekind>(?~rest=[^.]*)[.]}
									{<[report E \[$line\] could not identify the text: [truncate $rest]; concat rule]>}
								unknown-rule
									{(.*),((?~N=.*)),<rulekind>(?~rest=[^.]*)[.]}
									{<[report E \[$line\] could not identify the rule: "[truncate [concat $N ## $rest]]"; concat rule]>}
								find-full-stop-at-end-of-rule
									{(.*),<endrule>[.]}
									{<rule>}
								error-at-end-rule
									{(metarule),<endrule>(?~rest=[^.]+)[.]}
									{<[report E \[$curr\] expected '.' to end metarule: [truncate $rest]; concat rule]>}
								error-at-end-rule
									{((?~what=.*)),<endrule>(?~rest=[^.]+)[.]}
									{<[report E \[$curr\] expected '.' to end $what: [truncate $rest]; concat rule]>}
								error-at-end-rule
									{(metarule),<endrule>(?~rest=.*)}
									{<[report E \[$curr\] expected '.' to end metarule: [truncate $rest]; concat quit]>}
								error-at-end-rule
									{((?~what=.*)),<endrule>(?~rest=.*)}
									{<[report E \[$curr\] expected '.' to end $what: [truncate $rest]; concat quit]>}
									
								<Parse the next hypernotion, metanotion, or symbol>

								skip-comments
									{<(?~state=.*)>\{[^{}]*\}}
									{<$state>}
							}]
						}
						default {
							lappend grammarhtml $piece
						}
					}
				}
			}
		}
	}
	<Ensure foreign and hyperrules are disjoint>
	<Symbol table postparse processing>
	<Make aleph definitions of undefined attributes>
	log Complete parse input.
	foreach {character name} [array get glyphName] {
		hr glyph: $name := $character
	}
}

		
 
section    top

identify-a-property-rule
	{(proto),((?~N=.*)),<rulekind>=}
	{($N),<propertyrule>}

<Attribute property rule>
<Conflicts property rule>
<Glyph property rule>
<Grammar type property rule>
<Implementor property rule>
<Include property rule>
<Lookahead property rule>
<Name property rule>
<Reserved symbol property rule>
<Signature property rule>
<Start property rule>
<Stylesheet property rule>
<Symbol table>

unidentified-property
	{((?~P=.*)),<propertyrule>[^.]*}
	{(property),<[report E \[$line\] unknown property: $P; concat endrule]>}

		
 
section    top

start-property
	{(start),<propertyrule>}
	{(startproperty),<notion>}
start-property
	{((proto|meta|hyper)),((?~N=.*)),<startproperty>}
	{(start property),<[set first 0; hr %start := $N,%end; concat endrule]>}

		
 
section    top

k-property
	{(k),<propertyrule>}
	{(),<kproperty>}
k-property
	{((?~K=.*)),<kproperty>(?~k=\d+)}
	{($K$k),<kproperty>}
k-property
	{((?~K=.*)),<kproperty>}
	{(k property),<[hr k: := $K; concat endrule]>}

		
 
section    top

name-property
	{(name),<propertyrule>}
	{(nameproperty),<notion>}
name-property
	{((proto|meta|hyper)),((?~N=.*)),<nameproperty>}
	{(name property),<[hr name: := $N; mr name: := $N_metarules; concat endrule]>}

		
 
section    top

grammar-type-property
	{(type),<propertyrule>}
	{(),<typeproperty>}
grammar-type-property
	{((?~K=.*)),<typeproperty>(?~k=\d+)}
	{($K$k),<typeproperty>}
grammar-type-property
	{((?~K=2|3)),<typeproperty>}
	{(type property),<[hr type: := $K; concat endrule]>}
grammar-type-property
	{((?~K=.*)),<typeproperty>}
	{(type property),<[report E \[$line\] grammar type not 2 or 3: $K; concat endrule]>}

		
 
section    top

include-property
	{(include),<propertyrule>'(?~uri=([^']|'')*)'}
	{($uri),(),<endincludeuri>}
end-include-property
	{((?~garbage=.*)),<endincludeuri>(?~text=[^.])}
	{($garbage$text),<endincludeuri>}	
end-include-property
	{((?~uri=.*)),((?~garbage=.*)),<endincludeuri>[.]}
	{<rule>[
		if {[string length $garbage]} {report E unexpected text after include uri: [truncate $garbage]}
		uri get [string map {'' '} $uri]
	]}

include-property
	{(include),<propertyrule>\[}
	{(%include),(),(1),<foreigncode>}
included-foreign-text
	{(%include),((?~code=.*)),(1),<foreigncode>\]}
	{(include property),<[
		gen foreign_include $code
		concat endrule
	]>}

include-property
	{(include),<propertyrule>}
	{(include property),<[
		report E \[$line\] url or foreign code expected after include
		concat endrule
	]>}

		
 
section    top
 
section    top

implementor-property
	{(implementor),<propertyrule>'(?~N=([^']|'')+)'}
	{(implementor property),<[hr implementor: := [string map {'' '} $N]; concat endrule]>}

		
 
section    top

identify-a-metarule
	{(meta),((?~N=.*)),<rulekind>:\s*:}
	{($N),<[set curr $N; concat metarule]>}
identify-a-bad-metarule
	{(hyper|proto),(#NS),<rulekind>:\s*:[^.]*}
	{(metarule),<[
		report E \[$line\] missing metanotion before '::'
		concat endrule
	]>}
metarule
	{<metarule>}
	{(),(mrmember)<notion>}
metarule-member
	{((?~rule=.*)),((proto|meta|hyper)),((?~HN=.*)),<mrmember>}
	{([sym union $rule $HN]),<mrmemberseparator>}
metarule-alternative
	{<mrmemberseparator>[;|]}
	{(mrmember),<notion>}
metarule-completed
	{((?~MN=.*)),((?~rule=.*)),<mrmemberseparator>}
	{(metarule),<[mr $MN +:= $rule; concat endrule]>}

		
 
section    top

identify-a-bad-hyperrule
	{(hyper|proto),(#NS),<rulekind>:\s*[^.]*}
	{(hyperrule),<[
		report E \[$line\] missing hypernotion before ':'
		concat endrule
	]>}
identify-a-hyperrule
	{((proto|meta|hyper)),((?~N=.*)),<rulekind>:}
	{($N),<[set curr $N; concat hyperrule]>}
hyperrule
	{<hyperrule>}
	{(),(),(hrmember)<notion>}
hyperrule-equal-member
	{((proto|meta|hyper)),((?~HN=.*)),<hrmember>=\s*=}
	{($HN%eq%),(hrposteqmember),<notion>}
hyperrule-equal-member
	{((proto|meta|hyper)),((?~HN=.*)),<hrmember>=}
	{($HN%eq%),(hrposteqmember),<notion>}
hyperrule-not-equal-member
	{((proto|meta|hyper)),((?~HN=.*)),<hrmember>/\s*=}
	{($HN%ne%),(hrposteqmember),<notion>}
hyperrule-not-equal-member
	{((proto|meta|hyper)),((?~HN=.*)),<hrmember>­}
	{($HN%ne%),(hrposteqmember),<notion>}
hyperrule-post-equal-member
	{((?~HN1=.*)),((proto|meta|hyper)),((?~HN2=.*)),<hrposteqmember>}
	{(hyper),($HN1$HN2),<hrmember>}
hyperrule-member
	{((?~alternative=.*)),((proto|meta|hyper)),((?~HN=.*)),<hrmember>}
	{([
		if {[sym null $alternative]} {set HN} else {sym concat $alternative $HN}
	]),<hrmemberseparator>}
hyperrule-member
	{<hrmemberseparator>,}
	{(hrmember),<notion>}
hyperrule-alternative
	{((?~rule=.*)),((?~alternative=.*)),<hrmemberseparator>[;|]}
	{([sym union $rule $alternative]),(),(hrmember),<notion>}
hyperrule-completed
	{((?~HN=[^%].*)),((?~rule=.*)),((?~alternative=.*)),<hrmemberseparator>}
	{([if {[string match *symbol $HN]} {concat symbol rule} else {concat hyperrule}]),<[
		hr $HN +:= [sym union $rule $alternative]
		if {$first && ![string match *symbol $HN]} {hr %start := $HN,%end; set first 0}
		concat endrule
	]>}

		
 
section    top

24 :: A notion is here used to refer to a sequence of small and/or large marks, and things that become small marks like literals. The calling sequence for this dpda state is

rule-comment
rule-match
(next-state),<notion>
...
rule-comment
(proto|meta|hyper),(large-and-small-marks)<next-state>
rule-substitution

'proto' indicates only small marks were parsed (or no marks were parsed), 'meta' only large marks, 'meta' only large marks, and 'hyper' for a mixture. The symbol is the concatenation of the marks, or #NS if no marks.

 
section    top

start-a-protonotion
	{<notion>(?~N=[a-z()]+)}
	{(proto),($N),<notioni>}
start-a-protonotion
	{<notion>(?~N=[#'])}
	{(proto),(),<notioni>$N}
start-a-hypernotion
	{<notion>(?~N=\[)}
	{(hyper),(),<notioni>$N}
start-a-metanotion
	{<notion>(?~N=[A-Z0-9]+)}
	{(meta),($N),<notioni>}
empty-notion
	{((?~state=.*)),<notion>}
	{(proto),(#NS),<$state>}
<Literal glyphs in a hypernotion>
<Decimal numbers in a hypernotion>
continue-a-protonotion
	{(proto),((?~PN=.*)),<notioni>(?~N=[a-z()]+)}
	{(proto),($PN$N),<notioni>}
continue-a-metanotion
	{(meta),((?~MN=.*)),<notioni>(?~N=[A-Z0-9]+)}
	{(meta),($MN$N),<notioni>}
continue-a-hypernotion
	{(.*),((?~HN=.*)),<notioni>(?~N=[a-z()A-Z0-9]+)}
	{(hyper),($HN$N),<notioni>}
end-a-notion
	{((?~state=.*)),((?~kind=.*)),((?~N=.*)),<notioni>}
	{($kind),([sym encode string $N]),<$state>}

		
 
section    top

variable defaultGlyphName
array set defaultGlyphName {
	<Default glyph translations>
}

		
 
section    top

\n newline		\r return		\t tab
" " space		! exclaim		"\"" quote		# hash
$ dollar		% percent		& ampersand		'' apostrophe
( leftparen		) rightparen	* asterisk		+ plus
, comma			- dash			. fullstop		/ slash
0 zero			1 one			2 two			3 three
4 four			5 five			6 six			7 seven
8 eight			9 nine			: colon			; semicolon
< lessthan		= equals		> greaterthan	? query
@ at			A largea		B largeb		C largec
D larged		E largee		F largef		G largeg
H largeh		I largei		J largej		K largek
L largel		M largem		N largen		O largeo
P largep		Q largeq		R larger		S larges
T larget		U largeu		V largev		W largew
X largex		Y largey		Z largez		[ leftbracket
\\ backslash	] rightbracket	^ circumflex	_ underline
@ at			a lettera		b letterb		c letterc
d letterd		e lettere		f letterf		g letterg
h letterh		i letteri		j letterj		k letterk
l letterl		m letterm		n lettern		o lettero
p letterp		q letterq		r letterr		s letters
t lettert		u letteru		v letterv		w letterw
x letterx		y lettery		z letterz		\{ leftbrace
| verticalbar	\} rightbrace	~ tilde

		
 
section    top

variable defaultGlyphName
array set glyphName [array get defaultGlyphName]

		
 
section    top

glyph-property
	{(glyph),<propertyrule>'(?~char=''|[^'])'}
	{($char),(glyphproperty),<notion>}
glyph-property
	{(glyph),<propertyrule>}
	{<[report E \[$line\] missing literal after 'glyph=' property; concat quit]>}
glyph-property
	{(.*),(proto),(#NS),<glyphproperty>}
	{(glyph property),<[report E \[$line\] missing glyphname; concat endrule]>}
glyph-property
	{((?~char=.*)),(proto),((?~N=.*)),<glyphproperty>}
	{(glyph property),<[set glyphName($char) $N; concat endrule]>}

		
 
section    top

literal-glyphs
	{<notioni>'(?~string=([^']|'')*)'}
	{<notioni>[string map [array get glyphName] $string]}
literal-glyphs-error
	{<notioni>'}
	{<[report E \[$curr\] missing close quote ('); concat notioni]>}

		
 
section    top

NUMBER
	{<notioni>#(?~num=[0-9 \t\n]+)}
	{<notioni>number([string map {
		0 zero 1 one 2 two 3 three 4 four
		5 five 6 six 7 seven 8 eight 9 nine
		" " "" "\t" "" "\n" ""
	} $num])}
missing-NUMBER
	{<notioni>#}
	{<[report E \[$curr\] missing digits after "'#'"; concat notioni]>}

		
 
section    top

32. Foreign rules :: Foreign text is a way to enter programming code written in another language into the text of a grammar. Attributes can be made available for input and output to the text. As far as the context free grammar is concerned, each foreign rule is interpretted as an empty production; the foreign text is then evaluated during a reduction.

The foreign text must contain a balanced number of brackets; no escapes are available, nor are any kinds of string or comments interpretted to hide unbalanced brackets. If the foreign code must used unbalanced brackets, it must do so outside the text of the grammar.

foreign-rule ::= hypernotion : foreign-texts.

foreign-texts ::= foreign-text | foreign-texts; foreign-text

foreign-text ::= foreign-output foreign-input language [foreign-code] | language immediate [foreign-code]

foreign-output ::= [ (metanotion...) ]

foreign-input ::= [ metanotion... ]

language ::= [ protonotion ]

The language cannot end in 'immediate'

foreign-code ::= [ foreign-chunk... ]

foreign-chunk ::= any-characters-except-[-and-] | [foreign-code]

Normally in the cf parser, foreign texts are evaluated with the attributes after parsing. If the small marks 'immediate' are included after the language, means the foreign text is evaluated immediately on reduction within the parser itself. In a scanner, all foreign texts are immediate, whether marked so or not. Because immediate texts are evaluated during the parse, they can alter the lexical interpretation and other aspects of parser so that it can accept languages that it could not otherwise.

For example, C has well known ambiguity with typedef names. A C grammar can use immediate foreign texts in the parser and scanners with a rudimentary symbol table to remove this ambiguity.

NEST block:
	left brace symbol, push new typedef level,
		NEST declarations into NEST1, NEST1 statements,
		right brace symbol, pop off typedef level.
NEST typedef: typedef symbol, TAG symbol, make typedef.

IDENTIFIER symbol: TAG symbol, relabel if typedef.
TAG symbol: letter, letter or digit sequence option.

include = [
	typedef struct TopLevel TopLevel;
	struct TopLevel {int depth; char *tag; TopLevel *under;};
	TopLevel *topLevel = 0; int topLevelDepth = 0;
].
push new typedef level: immediate [
	topLevelDepth++;
].
pop new typedef level: immediate [
	topLevelDepth--;
	while (topLevel && topLevel->depth>topLevelDepth) {
		TopLevel *u = topLevel->under; free(topLevel); topLevel = u;
	}
].
relabel if typedef: immediate [
	TopLevel *t; for (t=topLevel; t; t=t->under) {
		if (strcmp(t->tag,Tcl_GetString(bufferContents()))==0) {
			PS->override = true;
			PS->reserved = 0;
			PS->symbol = TYPENAMEsymbol;
			PS->nameclass = 0;
			break
		}
	}
]
make typedef: immediate [
	TopLevel *t; t = malloc(sizeof(TopLevel));
	t->depth = topLevelDepth; t->tag = bufferString(lastlexeme.name,0);
	t->under = topLevel; topLeve = t;
]

An immediate foreign text cannot have explicit output or input variables. This is because variables are propagated by the attribute evalator after the parse is completed, but immediate texts are evaluated before the parse is completed. Immediate texts which need to communicate need to establish some protocol with global variables.

(The language string cannot end in 'immediate' unless there is another 'immediate' after it. ximmediat e[...] is an immediate text in language x; ximmediat e immediate[...] is an immediate text in language ximmediate.)


identify-a-foreign-rule
	{((proto|meta|hyper)),((?~HN=.*)),<rulekind>:\s*(?~la=(\([A-Z0-9 \t\n]*\))?[A-Z0-9 \t\n]*[a-z() \t\n]*\[)}
	{($HN),(),<[set curr $N; concat foreignrule]>$la}
foreign-text-variables
	{<foreignrule>(\((?~output=([A-Z0-9 \t\n]*))\))?(?~input=[A-Z0-9 \t\n]*)(?~language=[a-z() \t\n]*)}
	{([regsub -all \\s+ $output {}]),([regsub -all \\s+ $input {}]),([regsub -all \\s+ $language {}]),(),(0),<foreigncode>}
foreign-code-begin
	{(0)<foreigncode>\[}
	{(1)<foreigncode>}
foreign-code-begin
	{((?~HN=.*)),((?~alts=.*)),((?~output=.*)),((?~input=.*)),((language=.*)),((?~code=.*)),(0),<foreigncode>}
	{(foreign rule),<[report E missing foreign-code '\[...\]': $HN; concat endrule]>}
foreign-code-end
	{((?~alts=.*)),((?~output=.*)),((?~input=.*)),((language=[^%]*)),((?~code=.*)),(1),<foreigncode>\]}
	{([
		if {![string length $output]} {set output -}
		if {![string length $input]} {set input -}
		if {![string length $language]} {set language -}
		if {[string match *immediate $language]} {
			if {![string equal $output$input --]} {
				report E \[$line\] immediate foreign text cannot have variables: ($output)$input
			}
		}
		sym union $alts [sym encode string [list $output $input $language $code]]
	]),<foreigntextor>}
foreign-code-nest
	{((?~code=.*)),((?~N=.*)),<foreigncode>\[}
	{($code\[),([expr {$N+1}]),<foreigncode>}
foreign-code-unnest
	{((?~code=.*)),((?~N=.*)),<foreigncode>\]}
	{($code\]),([expr {$N-1}]),<foreigncode>}
foreign-text-OR-or-END
	{<foreigntextor>[;|]}
	{<foreignrule>}
foreign-text-completed
	{((?~HN=.*)),((?~alts=.*)),<foreigntextor>}
	{(foreign rule),<[
		hr foreigns: +:= $HN
		hr foreign: $HN +:= $alts
		concat endrule
	]>}

		
 
section    top

sym each hn [sym intersect [hr foreigns:] [hr::nonterminals]] {
	report E production cannot be both a hyperrule and a foreign-rule: $hn
	hr::delete foreign: $hn
}
sym each hn [hr foreigns:] {hr $hn := "#NS"}

		
   
   

Grammar Properties

   
top

34 :: The notations used herein are the usual notations from formal linguistics.

A context-free grammar G=(N, Σ, P, S) where N is the set of nonterminal, Σ is the set of terminal symbols, P is the set of productions, and S is the start symbol.
The transitive-reflexive closure is X* and the transitive closure is X+ .
The vocabulary VN∪Σ
P is finite subset, PN×V* written in the form A→α where AN and α∈V* .
The derive relations is αAβ⇒αδβ where A→δ is in P .
The language is L (G)={w∈Σ* : S*w} .
The empty string is ε .
a, b, c, ⋯ denote terminals of Σ .
A, B, C, ⋯ denote nonterminals of N .
X, Y, Z denote symbols of V .
u, v, ⋯, z denote strings of Σ* .
α, β, γ, ⋯ denote strings of V* .
The set concatenation operator is S1 ||S2={w1 w2 : w1S1w2S2} .

This is at odds with the vW2 notation which uses upper case letters as metanotion, lower case strings as nonterminals or (ending in 'symbol') terminals, and concatenation with an explicit ','.

 
section    top

proc grammarProperties {grammar} {
	log Begin grammar properties.
	<Compute grammar properties>
	log Order the vocabulary from terminal to the root production.
	<Order the vocabulary from terminal to the root production>
	log Remove useless productions.
	<Remove useless productions>
	log Remove inaccessible productions.
	<Remove inaccessible productions>
	log Finiteness analysis.
	<Finiteness analysis>
	log Cycle analysis.
	<Cycle analysis>
	log Format the properties of the grammar for reporting.
	<Format the properties of the grammar for reporting>
	$grammar order: := $order
	log Completed grammar properties.
}

		
 
section    top

${grammar}::property weight: {grammar set} {
	if {[sym card $set]==1 && [sym length $set]==1} {return 0}
	sym each alt $set {
		set w 0
		sym enum sym $alt {
			incr w [$grammar weight: [sym first $sym]]
		}
		if {$w==0} {
			;
		} elseif {![info exists W]} {
			set W $w
		} elseif {$W>$w} {
			set W $w
		}
	}
	if {![info exists W]} {set W 0}
	return $W
}

${grammar}::property nullable: {grammar set} {
	if {[sym card $set]==1 && [sym length $set]==1} {error "undefined nullable: $set"}
	set nu 0
	sym each alt $set while {!$nu} {
		set nu 1
		sym enum sym $alt while {$nu} {
			set nu [$grammar nullable: [sym first $sym]]
		}
	}
	return $nu
}

${grammar}::property empty: {grammar set} {
	expr {[$grammar weight: $set]==0}
}

		
 
section    top

37. Order the vocabulary from terminal to the root production :: The nodes of the produce graph are the vocabulary symbols V and there is an edge (A, X) if A→αXβ . Strongly connected components (scc) of produce are then mutually recursive productions. This allows the grammar to be decomposed into nested subgrammars, with fewer simultaneously nonterminals to deal with at each analysis step. For example with

SAB
ABb
AC
BAa
BD
CDd
Ce
DCc
Df
First the nested grammar with N={C, D} and Σ={c, d, e, f} can be analyzed, then the grammar with N={A, B} and Σ={C, D, a, b} , and finally the grammar with N={S} and Σ={A, B} .

This code does the following steps:

(1) Creates the produce graph.
(2) Factors it by strongly connected regions; mutually recursive productions form an scc, and a single node in the factorred graph.
(3) Topological sorts the graph. (Because the order for sym tsort cannot be specified, the graph is first inverted so that tsort sorts from leaves to roots instead roots to leaves.) The sort produces the total order X1, X2, ⋯, Xn such that if Xj*Xi then ij .
(4) Each element of the factorred graph orderring replaces all the symbols in corresponding scc of the produce graph.

Whether the symbol is a terminal or nonterminals is added to the order as a convenience to later passes that need to know.


set produce {}
sym each pn [${grammar}::nonterminals] {
	sym each alt [${grammar}::if $pn := {}] {
		sym enum member $alt {
			set produce [sym union $produce $pn,$member]
		}
	}
}
set order {}
sym enum forder [sym tsort [sym factor [sym invert $produce]]] {
	set scc [sym member $produce [sym scc $produce $forder]]
	lappend order $scc [sym contains [${grammar}::nonterminals] $scc]
}

		
 
section    top

set neworder {}; $grammar useless: := {}
foreach {scc N} $order {
	sym each sym $scc {
		set useful($sym) [expr {!$N}]
	}
}
foreach {scc N} $order {
	set isUseful 0
	sym each pn $scc while {!$isUseful} {
		if {$useful($pn)} {
			set isUseful 1
		} else {
			sym each alt [$grammar $pn] while {!$isUseful} {
				set isUseful 1
				sym enum member $alt while {$isUseful} {
					set isUseful $useful($member)
				}
			}
		}
	}
	<Prune useless productions of a nonterminal>
}
set order $neworder

		
 
section    top

sym each pn $scc {
	set useful($pn) $isUseful
}
sym each pn $scc {
	if {$isUseful && $N} {
		set newprod {}
		sym each alt [$grammar $pn] {
			set keep 1
			sym enum member $alt while {$keep} {
				set keep $useful($member)
			}
			if {$keep} {set newprod [sym union $newprod $alt]}
		}
		$grammar $pn := $newprod
	} elseif {$N} {
		report I useless production: $pn
		${grammar}::delete $pn
	}
}
if {$isUseful} {
	lappend neworder $scc $N
} else {
	$grammar useless: +:=  $scc
}

		
 
section    top

set accessible {}
$grammar inaccessible: := {}
foreach {scc N} $order {
	sym each sym $scc {
		if {!$N} {
			set accessible [sym union $accessible $sym]
		}
	}
}
vprioq new work -id %start 0 %start
while {[vprioq dequeue work v est; set est]} {
	if {![sym contains [${grammar}::nonterminals] $v]} continue
	set accessible [sym union $accessible $v]
	sym each w [sym from $produce $v] {
		vprioq enqueue work -id $w 0 $w
	}
}
set neworder {}
foreach {scc N} $order {
	if {[sym contains $accessible $scc]} {
		lappend neworder $scc $N
	}
}
set order $neworder
sym each sym [sym diff [${grammar}::nonterminals] $accessible] {
	$grammar inaccessible: +:= $sym
	report I inaccessible production: $sym
}

		
 
section    top

foreach {scc N} $order {
	set sccfinite 1
	sym each v $scc while {$sccfinite} {
		sym each w [sym from $produce $v] while {$sccfinite} {
			if {[sym scc $produce $v]==[sym scc $produce $w]} {
				set sccfinite 0
			} elseif {[info exists finite($w)] && !$finite($w)} {
				set sccfinite 0
			}
		}
	}
	sym each v $scc {
		set finite($v) $sccfinite
	}
}
set U {}
foreach {v sccfinite} [array get finite] {
	if {$sccfinite} {set U [sym union $U $v]}
}
$grammar finite: := $U

		
 
section    top

42. Cycle analysis :: Lt Bradshaw: 'Take him down to hindquarters for cycle-analysis.' Young Guy, Motor Detective

This section examines derivations of productions to discover

(1) Whether the symbol is nullable, if any derivation X+ε exists.
(2) Whether the symbol is empty, if all derivations are X+ε .
(3) The WEIGHT of a symbol.
(4) Whether this is a type 2 grammar, if there is any derivation X+wXy∧⊤w|&gt;0∧⊤y|&gt;0 .

From Ancona, Dodero, Gianuzzi, and Morgavi Efficient Construction of LR(k) States and Tables TOPLAS January 1991, WEIGHT (X)=minw| where X*ww≠ε , WEIGHT (X)=0 if all possible derivations are empty.

The basic notion for each nested grammar find the minimal derivations A→αBζ→αβCδζ*η where η=(ΣA∪{A}) , and ΣA are the terminals of the nested grammar. There is a trap here for grammars such as

ABb|c
BAa|Bd|e
that it has the derivations
Ac
ABbAab
ABbeb
ABbBdbAadb
but also
ABbBdb*Bd+b
and never derive a string with just A or a terminal. (In terms of the produce graph, it is necessary to discover all possible minimal cycles, all cycles without subcycles, from and to A in the scc with A and B .)

This is achieved by annnotating each symbol in a derivation with the set of productions used to derive that symbol. If while trying to derive from a nonterminal, a production was already used to get it, using the same production again would be a subcycle.

A→(B, {ABb})b→(B, {ABb, BBd})db
so that attempting to use BBd again would be detected and is rejected. The only derivations permitted from this are
(B, {ABb, BBd})dbedb
or
(B, {ABb, BBd})db→(A, {ABb, BBd, BAa})adb

If any nonrecursive derivation is nonnull, the weight of the recursive derivations cannot be less than that. (Actually unless they are cyclic, they will have to be greater.) So recursive derivations (those who have symbols in the same produce scc), only need to be considerred if all nonrecursive derivations are empty, or it still needs to be determined if the grammar is type 2 or 3. And since the weight is the minimum length, once the grammar type is determined, heavy derivations can be abandonned.

Each derivation is an labeld string.

String label 'from': the nonterminal symbol derived from.
Symbol i label 'prods': set of productions that derived this symbol.


set type [${grammar}::if type: := 3]
if {$type>=3} {
	$grammar type: 3
} else {
	$grammar type: 2
}
$grammar anyright: := 0
foreach {scc N} $order {
	<Initialise the derivations for a nested grammar>
	while {[vprioq dequeue s derivation est; set est]} {
		<Characterise the current derivation>
		if {$rcr} {
			if {[info exists W1($v)] && $w>=$W1($v) && [$grammar type:]==2} {
				<Abandon a recursive derivation that contributes no new information>
			} elseif {$j<0} {
				<Catalog a immediately recursive derivation>
			} else {
				<Expand a recursive derivation which is not immediately recursive>
			}
		} else {
			<Weigh a nonrecursive derivation>
		}
	}
	<Decide the weight after considerring all derivations>
}

		
 
section    top

set comp [sym scc $produce [sym choose $scc]]
vprioq new s
sym each v $scc {
	set R($v) {}
	if {$N} {
		set WE($v) 0
		$grammar nullable: $v := 0
		sym each alt [$grammar $v] {
			set derived [sym concat $v $alt]
			set alt [sym label $alt string from $v]
			for {set i 0} {$i<[sym length $alt]} {incr i} {set alt [sym label $alt $i prods $derived]}
			vprioq enqueue s -id $alt $alt
		}
	} else {
		set WE($v) 1
		set W1($v) 1
		$grammar nullable: $v := 0
	}
}

		
 
section    top

44. Characterise the current derivation :: Weigh and analyze the state of current derivation. w is the weight of nonullable terminal symbols, with respect to the nested grammar. Real terminals are always nonnullable, but these pseudoterminals might be nullable. wm is the minimum weight of all the nullable terminals. If w is 0, all terminals are nullable and the derivation is also.

rcr will be the number of times a recursive symbol appears in the derivation, and j if nonnegative is where the first recursive symbol other than the derived symbol appears. If rcr remains zero, the derivation is all terminals and completed. If j is negative but rcr is not, it is a recursive in the derived from symbol only and complete. Otherwise, some recursive nonterminal will be expanded and replaced with each of its productions.


set v [sym label $derivation string from]
set w 0; unset -nocomplain mw
set rcr 0; set j -1; set nelem [sym length $derivation]
for {set i 0 } {$i<$nelem} {incr i} {
	set sym [sym unlabel [sym index $derivation $i]]
	if {[sym scc $produce $sym]==$comp} {
		if {$i>0 && $i+1<$nelem} {$grammar type: := 2}
		if {$i>0 && $i+1==$nelem} {
			$grammar anyright: := 1
		}
		if {![string equal $sym $v]} {
			set j $i
		}
		incr rcr
		if {$rcr>1} {$grammar type: := 2}
	} else {
		if {![$grammar nullable: $sym]} {
			incr w $WE($sym)
		} elseif {$WE($sym)>0 && (![info exists mw] || $WE($sym)<$wm)} {
			set wm $WE($sym)
		}
	}
}

		
 
section    top

;

		
 
section    top

lappend R($v) $w

		
 
section    top

set sym [sym index $derivation $j]
set oldrules [sym label $derivation $j prods]
sym each alt [$grammar $sym] {
	set rule [sym concat $sym $alt]
	if {[sym contains $oldrules $rule]} continue
	set newrules [sym union $oldrules $rule]
	for {set k 0} {$k<[sym length $alt]} {incr k} {set alt [sym label $alt $k prods $newrules]}
	set newderivation [sym label [
		sym concat [
			sym range $derivation 0 [expr {$j-1}]
		] $alt [
			sym range $derivation [expr {$j+1}] end
		]
	] string from $v]
	vprioq enqueue s -id $newderivation [sym length $newderivation] $newderivation
}

		
 
section    top

if {$w==0} {
	$grammar nullable: $v := 1
	if {[info exists wm]} {set w $wm}
}
if {$w>0 && (![info exists W1($v)] || $w<$W1($v))} {
	set W1($v) $w
}

		
 
section    top

sym each v $scc {
	if {![info exists W1($v)]} {
		set W1($v) 0
		$grammar nullable: $v := 1
		foreach w $R($v) {
			if {$w>0 && ($W1($v)==0 || $w<$W1($v))} {
				set W1($v) $w
			}
		}
	}
	set WE($v) $W1($v)
	array unset R $v
	$grammar weight: $v := $WE($v)
}

		
 
section    top

if {![sym contains [${grammar}::nonterminals] %start] || ![${grammar}::if %start]} {
	report E grammar is useless: [$grammar name:]
	$grammar type: := -1
}
sym each w [${grammar}::terminals] {
	$grammar properties: $w := terminal
}

set finites [$grammar finite:]
sym each pn [${grammar}::nonterminals] {
	try {
		if {[string equal $pn %start]} {
			$grammar properties: $pn := "[${grammar}::if properties: $pn := {}]type-[$grammar type:] "
		}
		if {[$grammar weight: $pn]==0} {
			$grammar properties: $pn := "[${grammar}::if properties: $pn := {}]empty"
		} else {
			if {[$grammar nullable: $pn]} {
				$grammar properties: $pn := "[${grammar}::if properties: $pn := {}]nullable "
			}
			$grammar properties: $pn := "[${grammar}::if properties: $pn := {}]weight=[$grammar weight: $pn]"
		}
		if {[sym contains $finites $pn]} {
			$grammar properties: $pn := "[${grammar}::if properties: $pn := {}] finite"
		} else {
			$grammar properties: $pn := "[${grammar}::if properties: $pn := {}] infinite"
		}
	} {reset error} {
		$grammar properties: $pn := "[${grammar}::if properties: $pn := {}] (!! $rs !!)"
	}
}
sym each sym [$grammar useless:] {
	$grammar properties: $sym := "[${grammar}::if properties: $sym := {}]useless"
}
sym each sym [$grammar inaccessible:] {
	$grammar properties: $sym := "[${grammar}::if properties: $sym := {}]inaccessible"
}

		
   
   

Extract One Level Grammar from Two Level Grammar

   
top

<Encode hypernotion for vw1 parsing>
<substituteMetanotions>
<Earley parser>
<Recursively replace hypernotions in rewrite rules with signatures>

proc vW1 {} {
	log Begin vw1 extraction
	set definingATTRIBUTE [expr {![mr::if ATTRIBUTE]}]
	if {$definingATTRIBUTE} {mr ATTRIBUTE := /ATTRIBUTE/}
	log Catalog metanotions.
	<Catalog metanotions>
	log Partition metanotions into context-free and context-sensitive.
	<Partition metanotions into context-free and context-sensitive>
	log Define ATTRIBUTE metanotion.
	<Define ATTRIBUTE metanotion>
	log Eliminate recursive context-free metanotions.
	<Eliminate recursive context-free metanotions>
	log Eliminate finite context-free metanotions.
	<Eliminate finite context-free metanotions>
	log Find signatures of all hypernotions.
	<Find signatures of all hypernotions>
	log Find signatures of symbol table predicates.
	set predicates {}
	set immediates {}
	<Find signatures of symbol table predicates>
	log Rewrite the hyperrules as a context-free grammar + attributes.
	<Rewrite the hyperrules as a context-free grammar + attributes>
	<Replace hypernotions in rewrite rules with signatures>
	log Extract symbol definitions from the grammar.
	<Extract symbol definitions from the grammar>
	grammarProperties hr
	log Delete predicates from the context free grammar.
	<Delete predicates from the context free grammar>
	sym each sym [hr::terminals] {
		if {![string match *symbol $sym] && ![string match *character $sym] && ![string equal %end $sym]} {
			report I non-symbol is a terminal: $sym
		}
	}
	log Complete vw1 extraction.
}

		
 
section    top

set MT {}
set S 0
set D {}
sym each mn [mr::nonterminals] {
	set def [mr $mn]
	set s 0; set d 1
	foreach m [split $mn ""] {
		set d [expr {$d && [string match {[0-9]} $m]}]
		if {![vassoc get MT -exact est $s:$m ns; set est]} {
			incr S; set ns $S
			vassoc put MT $s:$m $ns
		}
		set s $ns
	}
	vassoc put MT $s:. [list 1 $mn]
	if {!$d} {lappend D $mn $s}
}
foreach {mn s} $D {
	foreach m {0 1 2 3 4 5 6 7 8 9} {
		if {![vassoc get MT -exact est $s:$m ns; set est]} {
			incr S; set ns $S
			vassoc put MT $s:$m $ns
		}
		if {[vassoc get MT -exact est $ns:. r; set est] && ![string equal $r [list 0 $mn$m]]} {
			report E conflicting metanotion definitions: $mn$m
		}
		vassoc put MT $ns:. [list 0 $mn$m]
	}
}

sym each mn [mr::nonterminals] {
	sym each alt [mr $mn] {
		ensymbolHypernotion $alt
	}
}
sym each pn [hr::nonterminals] {
	ensymbolHypernotion $pn
	sym each alt [hr $pn] {
		sym enum member $alt {
			ensymbolHypernotion $member
		}
	}
}
set digitalExtensor {}
foreach {tsub u} [mr::filter sub: *] {
	setp {sub t} $tsub
	mr $t := $u
	set digitalExtensor [sym union $digitalExtensor $t]
}

		
 
section    top

proc ensymbolHypernotion {hypernotion {terminalise 0}} {
	upvar 1 MT MT
	if {[sym empty $hypernotion] || [string length $hypernotion]==0} {
		set string ,#NS
		set hypernotion ""
	} else {
		set string ""
	}
	while {[string length $hypernotion]} {
		if {[regexp {^([()a-z%])(.*)$} $hypernotion - smallmark rest]} {
			set hypernotion $rest
			append string ,$smallmark
		} elseif {[regexp {^(/[A-Z0-9]+/)(.*)$} $hypernotion - terminalisedmark rest]} {
			set hypernotion $rest
			append string ,$terminalisedmark
		} elseif {[regexp {^([A-Z0-9]+)(.*)$} $hypernotion - largemarks rest]} {
			set hypernotion $rest
			vprioq new parse 0 [list 0 $largemarks ""]
			set OUT {}
			while {[vprioq dequeue parse task est; set est]} {
				setp {s in out} $task
				if {[string length $in]==0 && $s==0} {
					lappend OUT $out
				} else {
					if {[vassoc get MT -exact est $s:. ct; set est]} {
						setp {catalogged t} $ct
						if {!$catalogged} {
							set u [string range $t 0 end-1]
							mr base: $u +:= $t
							mr sub: $t := $u
							vassoc put MT $s:. [list 1 $t]
						}
						if {$terminalise} {set t /$t/}
						vprioq enqueue parse [list 0 $in $out,$t]
					}
					if {[string length $in]} {
						if {[vassoc get MT -exact est $s:[string index $in 0] ns; set est]} {
							vprioq enqueue parse [list $ns [string range $in 1 end] $out]
						}
					}
				}
			}
			switch [llength $OUT] {
				0 {
					report E unrecognised metanotions: $largemarks in $hypernotion
					set OUT [list ,$largemarks]
				}
				1 {
					;
				}
				default {
					set AMB {}
					foreach out $OUT {
						lappend AMB [string map {/ {}} [string range $out 1 end]]
					}
					report E ambiguous metanotions: [join $AMB |] in $hypernotion
				}
			}
			append string [lindex $OUT 0]
		} else {
			error "bad character in hypernotion: $hypernotion"
		}
	}
	string range $string 1 end
}

		
 
section    top

proc substituteMetanotions {hn pn} {
	upvar 1 eq eq
	upvar 1 MT MT
	set hnsubst ""
	sym enum piece [ensymbolHypernotion $hn] {
		if {[info exists eq($piece)]} {
			sym enum eqnotion $eq($piece) {
				append hnsubst $eqnotion
			}
		} else {
			append hnsubst $piece
		}
	}
	sym encode string $hnsubst
}

		
 
section    top

55. Partition metanotions into context-free and context-sensitive :: Metanotions are classified as either context-free or context-sensitive. Context-free metanotions are just used to create new productions, such as NOTION in

NOTION sequence: NOTION; NOTION sequence, NOTION.
Context-free metanotions can be dealt with by expanding references to them. This is in fact, what actually happens.

The remaining context-sensitive metanotions, called attributes, will remain as variables defined and verified by the context sensitive constraints.

Context-free metanotions are those that generate either a finite number of protonotions, like

STYLE:: bold; brief; style i; style ii; style iii.
or when they appear in a hyperalternative, they also appear in the hyperrule name, such as NOTION above. What appears to be context-free can also be declared instead to be context-sensitive with
attribute=METANOTION.
(This might be useful if the metanotion has a finite set, but it is so large that it would expand into an unworkable number of rules.)

Context-sensitive metanotions are those that are not context-free, either because they are declared so, or because they appear in a hyperalternative but not the rule name without having a finite protonotion expansion.


mr %start := [mr::nonterminals]
sym each mn [mr::nonterminals] {
	set alts1 {}
	sym each alt [mr $mn] {
		set alts1 [sym union $alts1 [ensymbolHypernotion $alt]]
	}
	mr $mn := $alts1
}
grammarProperties mr
mr::delete %start

set possiblecfmetanotions {}
set definitecsmetanotions [mr::if attributes: := {}]
sym each mn $digitalExtensor {
	if {[sym contains $definitecsmetanotions [string range $mn 0 end-1]]} {
		set definitecsmetanotions [sym union $definitecsmetanotions $mn]
	}
}
set finites [sym diff [sym intersect [mr::if finite: := {}] [mr::nonterminals]] $definitecsmetanotions]
set infinites [sym diff [mr::nonterminals] $finites]
sym each pn [hr::nonterminals] {
	set leftMNs {}
	set rightMNs {}
	sym enum p [ensymbolHypernotion $pn] {
		if {[string match {[A-Z0-9]*} $p]} {set leftMNs [sym union $leftMNs $p]}
	}
	sym each alt [hr $pn] {
		sym enum member $alt {
			sym enum p [ensymbolHypernotion $member] {
				if {[string match {[A-Z0-9]*} $p]} {set rightMNs [sym union $rightMNs $p]}
			}
		}
	}
	set possiblecfmetanotions [sym union $possiblecfmetanotions $leftMNs]
	set definitecsmetanotions [sym union $definitecsmetanotions [sym diff $rightMNs $leftMNs]]
}
set additions {}
sym each mn $definitecsmetanotions {
	set additions [sym union $additions [mr::if sub: $mn := {}]]
	set additions [sym union $additions [mr::if base: $mn := {}]]
}
set definitecsmetanotions [sym union $definitecsmetanotions $additions]
set cfmetanotions [sym diff $possiblecfmetanotions $definitecsmetanotions %start]
set csmetanotions [sym diff [mr::nonterminals] $cfmetanotions $finites]
sym each p $csmetanotions {
	mr properties: $p := "[mr::if properties: $p := {}] context-sensitive"
}
sym each p $cfmetanotions {
	mr properties: $p := "[mr::if properties: $p := {}] context-free"
}

		
 
section    top

attribute-property
	{(attribute),<propertyrule>}
	{(),<attributeproperty>}
attribute-property
	{((?~L=.*)),<attributeproperty>(?~R=[0-9A-Z]+)}
	{($L$R),<attributeproperty>}
attribute-property
	{((?~MN=.*)),<attributeproperty>[,;|]}
	{(),<[mr attributes: +:= $MN; concat attributeproperty]>}
attribute-property
	{((?~MN=.*)),<attributeproperty>}
	{(attribute property),<[mr attributes: +:= $MN; concat endrule]>}

		
 
section    top

set aleph 1
sym each attribute [mr::if attributes: := {}] {
	if {![mr::if $attribute]} {
		mr $attribute := aleph%[string map {0 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j} $aleph]
		incr aleph
	}
}

		
 
section    top

58. Define ATTRIBUTE metanotion :: To create a workably efficient parser, attribute values are not actually implemented as strings of protonotions. Rather they are structured according to their productions. In essence each alternative in the metarule is surrounded by unrepresentable small marks. This means for something like

NOTION:: ALPHA;NOTION ALPHA.
EMPTY:: .
ALPHA:: a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z.
NOTION sequence: NOTION; NOTION sequence, NOTION.
VARS:: VAR;VARS VAR.
VAR:: LETTERS var.
LETTERS:: LETTER; LETTERS LETTER.
LETTER:: letter ALPHA.
program: VARS define, VARS reference sequence.
NOTION and ALPHA are context-free because when they appear on the right hand side of a hyperrule, they are also on the left; EMPTY is context-free because it is finite. VARS and VAR are context-sensitive, and are actually interpretted as if written something like
VARS:: {VAR};{VARS VAR}.
VAR:: {NOTION var}.
However there are no small marks { or }, so ALPHA cannot include them, and this definition of NOTION could not match VARS-reference.

To help deal with this, if the metanotion 'ATTRIBUTE' is not defined in the grammar, it will be defined to be all the attribute metanotions. In this case, if the ALPHA is instead defined

ALPHA:: a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z;ATTRIBUTE.
with the implied definition
ATTRIBUTE:: VARS;VAR.
then this definition of NOTION will match VARS-reference.


if {$definingATTRIBUTE} {
	set prods $csmetanotions
	sym each mn $csmetanotions {
		if {[mr::if base: $mn]} {
			set prods [sym diff $prods $mn]
		}
	}
	mr ATTRIBUTE := $prods
	set cfmetanotions [sym diff $cfmetanotions ATTRIBUTE]
	set csmetanotions [sym union $csmetanotions ATTRIBUTE]
}

		
 
section    top

59. Earley parser :: Converting a two-level grammar to a one-level grammar requires parsing hypernotions with the metagrammar. While it is possible to use the DFA or LR(k) parser generators (and that was the original intent), because some of these grammars will be incremental, the hypernotions will be relatively short, and the parser generator is expensive, an Earley parser will be used.

This is taken from Aho and Ullman The Theory of Parsing, Translation, and Compiling section 4.2.2, Algorithms 4.5 and 4.6.

For this parser, the start symbol is always "!".

There is problem with creating the parse tree in Tcl: it may be quite deep and recurse into a Tcl stack overflow. The output is a postfix traversal of the parse tree: each node of the parse tree is represented by two elements in the traversal, the node size (number of children) and symbol. The size of a terminal is always -1, a nonterminal nonnegative.


proc earley {grammar hn} {
	$grammar start: !
	string map {/ ""} [${grammar}::parse $hn]
}

		
 
section    top

60. Eliminate recursive context-free metanotions :: Eliminate hyperrules with a context-free metanotion on both sides. (This is aimed at recursive metanotions, but it might also take care of some finite ones; remaining finite metanotions are dealt with below.)

The metanotions are eliminated by eliminating references to hyperrules using them. Each hypernotion, on the right side of a hyperrule, that matches such has a new hyperrule added, with a uniform name substitution for all context-free metanotions. In this way references to the metanotions are expanded and the original rules no longer needed.

To do this matching, an Earley parser is created. The grammar is

start
Union of all hyperrule production names, where the name contains a context-free metanotion. The large and small marks of the hypernotion constituting the name are renderred into terminal and nonterminal symbols of this grammar.
nonterminals
start, %start, and all metanotions.
terminals
Small marks and terminalised metanotions, written '/metanotion/'. Because the actual metanotions can appear in the hypernotions, and not just the protonotions generated by the metanotion, it is necessary for a metanotion to recognise itself when it appears in a hypernotion. This is handled by replacing the metanotions in hyperrule members and names with terminalised metanotions denoted by slash quotes. The only exception are the context-free metanotions in hyperrule names. Metanotions in metarules are left as-is.
productions
%start ::= start
start ::= hyperrule-name | ...
metanotion ::= original-metarule-alternatives | /metanotion/

Each hyperrule member is checked with the parser. If it recognises, it will detect the member marks matched by context-free metanotions. Then a new hyperrule is added which is copy of the matched hyperrule, but with the matched marks substituted for the context-free metanotions.

Because context-sensitive metanotions are being treated like terminals in this section, it is possible to generate duplicate hyperrules such "VARS1-define" and "VARS2-define", which would both end up as the final context-free production "VARS-define". These duplications will be left till later when the identity of context-sensitive metanotions are made clear.


set prods {}
sym each pn [hr::nonterminals] {
	set hn [ensymbolHypernotion $pn 0]
	set parts [string map {, |} $hn]
	if {![sym null [sym intersect $cfmetanotions $parts]] && [sym null [sym intersect $csmetanotions $parts]]} {
		set prods [sym union $prods $hn]
	}
}
if {![sym null $prods]} {
	sym grammar [namespace current]::cfgrammar
	cfgrammar start: := !
	cfgrammar ! := $prods
	<Select a subset of metarules to eliminate the context-free metanotions>
	vprioq new work
	set handled [hr::nonterminals]
	sym each pn [hr::nonterminals] {vprioq enqueue work -id $pn $pn}
	while {[vprioq dequeue work pn est; set est]} {
		sym each alt [hr $pn] {
			sym enum member $alt {
				if {![sym contains $handled $member]} {
					set handled [sym union $handled $member]
					log "  parsing $pn: ...$member..."
					set memberstring [ensymbolHypernotion $member 1]
					set derivations [earley cfgrammar $memberstring]
					foreach derivation $derivations {
						set derivation [lrange $derivation 0 end-2]
						set stack {}
						foreach {n X} $derivation {
							if {$n<0} {
								lappend stack 1 $X $X
							} else {
								set p [expr {[llength $stack]-3*$n}]
								set substring ""
								foreach {terminal A a} [lrange $stack $p end] {
									append substring $a
								}
								set stack [lreplace $stack $p end 0 $X $substring]
							}
						}
						set cons ""
						array unset eq *
						foreach {terminal A a} $stack {
							append cons $A
							if {!$terminal} {set eq($A) $a}
						}
						set cons1 [substituteMetanotions $cons $pn]
						set consalts1 {}
						sym each consalt [hr $cons] {
							set consalt1 "#NS"
							sym enum consmember $consalt {
								set consalt1 [sym concat $consalt1 [substituteMetanotions $consmember $cons]]
							}
							set consalts1 [sym union $consalts1 $consalt1]
						}
						log "    adding: $cons1: $consalts1"
						hr $cons1 +:= $consalts1
						vprioq enqueue work -id $cons1 $cons1
					}
				}
			}
		}
	}
	rename cfgrammar ""
}
unset -nocomplain work

		
 
section    top

set mnsToBeAdded {}
set members {}
vprioq new work
sym each mn $cfmetanotions {
	vprioq enqueue work -id $mn $mn
	set members [sym union $members $mn]
}
while {[vprioq dequeue work mn est; set est]} {
	if {!$definingATTRIBUTE || ![string equal $mn ATTRIBUTE]} {
		sym each alt [mr::if $mn := {}] {
			sym enum member $alt {
				if {[string match {[A-Z0-9]*} $member]} {
					vprioq enqueue work -id $member $member
					set members [sym union $members $member]
				}
			}
		}
	}
	lappend mnsToBeAdded $mn [sym union /$mn/ [mr::if $mn := {}]]
}
sym each mn [sym diff [mr::nonterminals] $members] {
	lappend mnsToBeAdded $mn /$mn/
}
foreach {mn prods} $mnsToBeAdded {
	sym each mnx [mr::if base: $mn := {}] {
		set prods [sym union $prods /$mnx/]
		cfgrammar $mnx := $mn
	}
	if {[mr::if sub: $mn]} {
		set prods [mr sub: $mn]
	}
	cfgrammar $mn := $prods
}

		
 
section    top

62. Eliminate finite context-free metanotions :: Eliminate alternatives with a context-free metanotion on the right side only. These will always be finite. Because they are finite, all that is necessary is to keep recursively replacing finite metanotions with their alternatives. The final expansion will be free of metanotions, context-free or context-sensitive; it will just contain small marks. There is no possiblility of conflicting metanotions introduced by the expansion, but there is during the expansion. This conflict is hidden by different levels of proc recursion.

Due to the uniform replacement rule, every use of particular finite metanotion in an alternative has to be replaced with the same protonotion.

All signatures are also passed through finite expansion; this simplifies the work of the grammar writer with little other impact.


array set finiteCache {}
proc expandFinite {mn} {
	upvar 1 finiteCache finiteCache
	upvar 1 finites finites
	if {[mr::if sub: $mn]} {set mn [mr sub: $mn]}
	if {![info exists finiteCache($mn)]} {
		set finiteCache($mn) {}
		sym each alt [mr $mn] {
			set exp "#NS"
			sym enum notion $alt {
				if {[sym contains $finites $notion]} {
					set exp [sym concat $exp [expandFinite $notion]]
				} else {
					set exp [sym concat $exp $notion]
				}
			}
			set finiteCache($mn) [sym union $finiteCache($mn) $exp]
		}
	}
	return $finiteCache($mn)
}

proc replaceFinite {finites map hn alt} {
	upvar 1 MT MT
	if {[llength $finites]} {
		set finites [setp {finite pns} $finites]
		sym each pn $pns {
			replaceFinite $finites [concat $map $finite $pn] $hn $alt
		}
	} else {
		array set eq $map
		set hn1 [substituteMetanotions $hn $hn]
		if {[string equal $alt %signature]} {
			hr signatures: +:= $hn1
		} else {
			set alt1 "#NS"
			sym enum member $alt {
				set alt1 [sym concat $alt1 [substituteMetanotions $member $hn]]
			}
			hr $hn1 +:= $alt1
		}
	}
}

sym each hn [hr::nonterminals] {
	sym each alt [hr $hn] {
		set s [sym encode set [sym decode string [ensymbolHypernotion $hn]]]
		sym enum member $alt {
			set s [sym union $s [sym encode set [sym decode string [ensymbolHypernotion $member]]]]
		}
		set s [sym intersect $s $finites]
		if {![sym null $s]} {
			hr $hn -:= $alt
			set exps {}
			sym each mn $s {
				lappend exps $mn [expandFinite $mn]
			}
			replaceFinite $exps {} $hn $alt
		}
	}
}
set signatures [hr::if signatures: := {}]
hr signatures: := {}
sym each sn $signatures {
	set s [sym intersect [sym encode set [sym decode string [ensymbolHypernotion $sn]]] $finites]
	if {[sym null $s]} {
		hr signatures: +:= $signature
	} else {
		set exps {}
		sym each mn $s {
			lappend exps $mn [expandFinite $mn]
		}
		replaceFinite $exps {} $sn %signature
	}
}
unset -nocomplain finiteCache

		
 
section    top

63. Find signatures of all hypernotions :: Consider a grammar such as

VARS :: VAR; VARS, VAR.
VAR :: TYPE ALPHA.
TYPE :: int; bool; ref TYPE.
ALPHA :: a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z.
program: VARS defines, VARS references.
VARS VAR defines: VARS defines, VAR define.
VAR defines: VAR define.
TYPE ALPHA define: TYPE, letter ALPHA symbol, stop symbol.
...

A vW2 grammar does not require metanotion constructors like "TYPE-ALPHA" in "TYPE-ALPHA-define" to be explicitly marked out. But to avoid parsing protonotions in the generated parser, all these implicit constructors must be made explicit. A scanner is constructed from the metanotions (again), this time from context-sensitive metanotions. Each hypernotion is scanned and where matching metanotions to hypernotion substring are found, the hypernotion is editted and a constructor member added to the hyperrule.

The problem here is that we have a set of hypernotions to be matched and a set of metanotions that can (partially or completely) match them. However many different metanotions might match, some trivially (like ALPHA will match every single small mark), some ambiguously (like VAR and VARS will both match int-x). The trick is to find a good set of metanotions as start tokens that will match constructors the grammar intended and only those.

Resolving this without restricting the grammar is at best hard, and perhaps impossible. For this code, each hypernotion is required to appear at least once with just the metanotions and not with any metanotion alternatives. This signature hypernotion can appear as a rule name, or rule member, or in the signature property rule, such as

signature = VARS defines.

A grammar is constructed to recognise hypernotion signatures. The grammar contains all metarules. The start production starts with everything that was in the signature property (if anything). Then each hypernotion in the hyperrules is sorted by length and parsed against the grammar so far. If it is recognised, the parse tree is saved (so that metanotions are matched against their subststrings in the hypernotion). Otherwise the hypernotion is added to the grammar as a new start symbol.

The signature of each hypernotion is discoverred: the minimal hypernotion matching the given hypernotion within a parsed TERM list.


<Lexical attribute variables>
sym grammar [namespace current]::csgrammar
csgrammar start: := !
csgrammar ! := ATTRIBUTE,%,e,q,%,ATTRIBUTE|ATTRIBUTE,%,n,e,%,ATTRIBUTE
set mnmap {/ {}}
vprioq new work
sym each mn $csmetanotions {
	sym each alt [mr $mn] {
		set finite 1
		sym enum mem $alt while {$finite} {
			set finite [expr {[string match {[a-z()%]*} $mem] || [sym contains $finites $mem]}]
		}
		if {$finite} {
			vprioq enqueue work -id $mn:$alt [list $mn $alt]
		}
	}
}
set finiteexpansions {}
while {[vprioq dequeue work task est; set est]} {
	setp {mn expansion} $task
	set indices {}
	set newre ""
	set i 0
	sym enum mem $expansion {
		if {[string match {[A-Z0-9]*} $mem]} {
			lappend indices $i
			append newre $mem\\d?
		} else {
			append newre $mem
		}
		incr i
	}
	lappend finiteexpansions $newre $mn
	foreach i $indices {
		set mem [sym index $expansion $i]
		set l [sym range $expansion 0 $i-1]
		set r [sym range $expansion $i+1 end]
		sym each alt [mr $mem] {
			set newexpansion [sym concat $l $alt $r]
			vprioq enqueue work -id $mn:$newexpansion [list $mn $expansion]
		}
	}
}
vprioq new work
sym each mn $csmetanotions {
	vprioq enqueue work -id $mn $mn
}
while {[vprioq dequeue work mn est; set est]} {
	csgrammar $mn := {}
	if {![sym contains $digitalExtensor $mn]} {
		csgrammar $mn +:= /$mn/
	}
	sym each hn [mr $mn] {
		csgrammar $mn +:= $hn
		sym enum mem $hn {
			if {[string match {[A-Z0-9]*} $mem]} {
				vprioq enqueue work -id $mem $mem
			}
		}
	}
	sym each mnx [mr::if base: $mn := {}] {
		csgrammar $mn +:= /$mnx/
		csgrammar $mnx := $mn
		lappend mnmap $mnx $mn
	}
}
sym each hn [hr::if signatures: := {}] {
	foreach {re mn} $finiteexpansions {
		regsub -all $re $hn $mn hn
	}
	csgrammar ! +:= [ensymbolHypernotion $hn 0]
}
vprioq new work
sym each pn [hr::nonterminals] {
	set hn [ensymbolHypernotion $pn 1]
	vprioq enqueue work -id $hn [sym length $hn] [list $pn $hn]
	set reduced 0; set red $pn
	foreach {re mn} $finiteexpansions {
		if {[regsub -all $re $red $mn red]} {
			set reduced 1
		}
	}
	if {$reduced} {
		set hn [ensymbolHypernotion $red 1]
		vprioq enqueue work -id $hn [sym length $hn] [list $red $hn]
	}
	sym each alt [hr $pn] {
		sym enum member $alt {
			set hn [ensymbolHypernotion $member 1]
			vprioq enqueue work -id $hn [sym length $hn] [list $member $hn]
			set reduced 0; set red $member
			foreach {re mn} $finiteexpansions {
				if {[regsub -all $re $red $mn red]} {
					set reduced 1
				}
			}
			if {$reduced} {
				set hn [ensymbolHypernotion $red 1]
				vprioq enqueue work -id $hn [sym length $hn] [list $red $hn]
			}
		}
	}
}
<Add tree rewrite node names to the signature work list>
while {[vprioq dequeue work pnhn est; set est]} {
	setp {pn hn} $pnhn
	set parsePrograms [earley csgrammar $hn]
	if {[llength $parsePrograms]} {
		foreach parseProgram $parsePrograms {
			set parseProgram [lrange $parseProgram 0 end-2]
			lappend parseProgram 0 !
			set parseStack {}
			foreach {popcount X} $parseProgram {
				set type [string map $mnmap $X]
				if {[string equal $X !]} {set popcount [llength $parseStack]}
				if {$popcount<0} {
					if {[string match {[a-z0-9%]*} $X]} {
						lappend parseStack [list $X {}]
					} else {
						lappend parseStack [list $type [sym nest VAR,$X] $type]
					}
				} else {
					set functor {}
					set arguments "#NS"
					set subtypes {}
					foreach entry [lrange $parseStack end-[expr {$popcount-1}] end] {
						setp {subtype entry} $entry
						append functor $subtype
						if {[sym length $entry]} {
							set arguments [sym concat $arguments $entry]
							lappend subtypes $subtype
						}
					}
					set parseStack [lrange $parseStack 0 end-$popcount]
					if {[regexp {^[A-Z0-9_]+$} $functor] && [sym length $arguments]==1} {
						lappend parseStack [list $type $arguments $type]
						<Catalog attributes equated to another metanotion>
					} else {
						<Catalog attributes constructed from other metanotions>
						lappend parseStack [list $type [sym nest [sym concat TERM $functor $arguments]] $subtypes]
					}
				}
			}
			setp {type relation subtypes} [lindex $parseStack end]
			set signature($pn) $relation
			<Verify the signature of a symbol>
			log "  signature $pn = old [displayAttribute $signature($pn)]"
		}
	} elseif {[regexp {^[()a-z]*%(eq|ne)%[()a-z]*$} $hn]} {
		set signature($pn) [sym nest TERM,$hn]
		log "  signature $pn = PROTO [displayAttribute $signature($pn)]"
	} else {
		set arguments "#NS"
		set vars "#NS"
		set functor ""
		csgrammar ! +:= [string map $mnmap $hn]
		sym enum notion $hn {
			append functor [string map $mnmap $notion]
			if {[string match {[/A-Z0-9]*} $notion]} {
				set notion [string map {/ {}} $notion]
				set arguments [sym concat $arguments [sym nest VAR,$notion]]
				set vars [sym concat $vars $notion]
			}
		}
		set signature($pn) [sym nest [sym concat TERM $functor $arguments]]
		log "  signature $pn = new [displayAttribute $signature($pn)]"
		<Generated command: term_vars>
	}
}
rename csgrammar ""

<Verify lexical attributes are not analyzed>
		
 
section    top

signature-property
	{(signature),<propertyrule>}
	{(signatureproperty),<notion>}
hypernotion-signature
	{(.*),((?~notion=.*)),<signatureproperty>}
	{<[hr signatures: +:= $notion; concat signaturepropertynext]>}
signature-property-separator
	{<signaturepropertynext>[,;|]}
	{(signatureproperty),<notion>}
signature-property-end
	{<signaturepropertynext>}
	{(signature property),<endrule>}

		
 
section    top

65. Rewrite the hyperrules as a context-free grammar + attributes :: With the completion of the signature array, the two level grammar can be rewritten as a one level grammar together with metanotions in a Horn clause for each production alternative. A term for each member and production has been derived: the Horn clause for an alternative is a (conjunctive) symbol string of the production name term followed by the member terms. The production and member production names are the functors of their terms.

The Horn clauses are attached to each alternative for non-predicates. (The next section will disjunct predicates alternatives.) The next phase will finish the clauses to make the logic program to evaluate the attributes.

Along the way, the predicate protonotion==protonotion and protonotion/=protonotion are dealt with. If such predicates are true, they can be elided, and if false, they block the alternative.


set vw1 {
	ATTRIBUTE%eq%ATTRIBUTE #NS -
	ATTRIBUTE%ne%ATTRIBUTE #NS -
}
set VW1 ATTRIBUTE%eq%ATTRIBUTE|ATTRIBUTE%ne%ATTRIBUTE
sym each hn [hr::nonterminals] {
	set pn [sym index [sym unnest $signature($hn)] 1]
	if {[hr::if foreign: $hn]} {
		<Massage foreign text into a pseudo-TERM>
	} else {
		sym each alt [hr $hn] {
			set horn $signature($hn)
			set vwalt "#NS"
			set rejected 0
			sym enum member $alt {
				set member1 [sym index [sym unnest $signature($member)] 1]
				set goal $signature($member)
				set lgoal [sym decode string [sym unnest $goal]]
				if {[regexp {^([a-z()]*)%(eq|ne)%([a-z()]*)$} $member1 - a op b]} {
					set rejected [expr {[string equal $a $b]!=[string equal $op eq]}]
					if {$rejected} break
				} elseif {[string first %ne% $member1]>=0} {
					set vwalt [sym concat $vwalt $member1]
					setp {- - p q} $lgoal
					set horn [
						sym concat $horn [
							sym nest [
								sym encode string [list ANTIUNIFY $p $q]
							]
						]
					]
				} elseif {[string first %eq% $member1]>=0} {
					set goal [sym unnest $goal]
					lset lgoal 1 %eq%
					set vwalt [sym concat $vwalt $member1]
					set horn [sym concat $horn [sym nest [sym encode string $lgoal]]]
				} else {
					set vwalt [sym concat $vwalt $member1]
					set horn [sym concat $horn $goal]
				}
			}
			if {!$rejected} {
				lappend vw1 $pn $vwalt $horn
			}
			hr::delete cs: $hn $alt
		}
	}
	hr::delete $hn
}
foreach {pn alt rules} $vw1 {
	hr $pn +:= $alt
	if {![string equal $rules -]} {
		hr cs: $pn $alt +:= $rules
	}
}
foreach {sym sig} [array get signature *symbol] {
	hr cs: [sym index [sym unnest $sig] 1] "#NS" := $sig
}

		
 
section    top

set foreigns [hr foreign: $hn]
set predicates [sym union $predicates $pn]
set disj {}
foreach foreign [sym decode $foreigns] {
	setp {output input language code} $foreign
	if {[string match *immediate $language]} {
		set immediates [sym union $immediates $pn]
	} else {
		set goal FOREIGN,$language
		foreach val [list $output $input] prefix {+ -} {
			if {[string equal $val -]} continue
			sym enum symbol [ensymbolHypernotion $val] {
				set goal [sym concat $goal $prefix$symbol]
			}
		}
		set goal [sym nest [sym concat $goal [sym encode string [list $code]]]]
		set disj [sym union $disj [sym concat $signature($hn) $goal]]
	}
}
lappend vw1 $pn "#NS" $disj

		
 
section    top

67. Delete predicates from the context free grammar :: A predicate is an empty production with two or more alternatives, a foreign text, or a production which is generated by a predicate. Predicates have to be removed from the context free grammar or they will cause reduce-reduce conflicts.

The actual context free alternatives of the production are all deleted and replaced with a single empty string. All of the previously .cs strings distinguished by the alternative strings are disjuncted together into a single pn.#NS.cs set.

Predicates which are not immediate foreign text, which are empty productions, are removed from the alternatives, but will still appear in the semantics. LR(k) and DFA parser constructor do not have problems with empty productions, but their presence will add unnecessary and confusing entries to the parse tree.

Immediate foreign texts are left in place. The parser will evaluate them immediately on reduction and needs to be able to see them.


sym each pn [hr useless:] {hr::delete $pn}
sym each pn [hr inaccessible:] {hr::delete $pn}
set predicates [sym intersect $predicates [hr::nonterminals]]
vprioq new work
sym each pn [hr::nonterminals] {
	if {[hr empty: $pn] && [sym card [hr $pn]] > 1} {
		vprioq enqueue work -id $pn $pn
	}
}
while {[vprioq dequeue work pn est; set est]} {
	set predicates [sym union $predicates $pn]
	sym each alt [hr::if $pn := {}] {
		sym enum member $alt {
			vprioq enqueue work -id $member $member
		}
	}
}
sym each pn $predicates {
	set cs {}
	foreach {key rule} [hr::filter cs: $pn *] {
		setp {- - alt} $key
		set cs [sym union $cs $rule]
		hr::delete cs: $pn $alt
	}
	hr $pn := "#NS"
	hr cs: $pn "#NS" := $cs
	hr properties: $pn := "[hr::if properties: $pn := {}] predicate"
}
hr predicates: := $predicates
hr immediates: := $immediates
sym each pn [sym union [sym diff [hr::nonterminals] $predicates] $immediates] {
	sym each alt [hr $pn] {
		set changed 0
		set newalt #NS
		sym enum sym $alt {
			if {[sym contains $predicates $sym]} {
				set changed 1
			} else {
				set newalt [sym concat $newalt $sym]
			}
		}
		if {$changed} {
			if {[hr::if cs: $pn $alt]} {
				hr cs: $pn $newalt := [hr cs: $pn $alt]
				hr::delete cs: $pn $alt
			}
			hr $pn -:= $alt
			hr $pn +:= $newalt
		}
	}
}
sym each pn [sym union [sym diff [symr0::nonterminals] $predicates] $immediates] {
	sym each alt [symr0 $pn] {
		set changed 0
		set newalt #NS
		sym enum sym $alt {
			if {[sym contains $predicates $sym]} {
				set changed 1
			} else {
				set newalt [sym concat $newalt $sym]
			}
		}
		if {$changed} {
			if {[symr0::if cs: $pn $alt]} {
				symr0 cs: $pn $newalt := [symr0 cs: $pn $alt]
				symr0::delete cs: $pn $alt
			}
			symr0 $pn -:= $alt
			symr0 $pn +:= $newalt
		}
	}
}

		
   
   

Attributes

   
top

68. Derive the attribute evaluation machine :: The attribute rules are stored in the grammar. For the production production: member,member,..., the Horn clauses defining the attributes is stored as symbol string sets in the property cs: production.member,member,....

context-sensitive-value ::= [ rule|rule... ]
rule ::= head [ ,term ] ... | %query,query-name,term...
head ::= term
term ::= <TERM,functor [ ,term ] ...> | <VAR,variable> | <NIL> | <LIST,term...> | <ANTIUNIFY,term,term> | <FOREIGN,language, [ ,foreign-var ] ...,code> | <FAILURE,_RECOVERY,_LOC> | <SUCCESS,_RECOVERY> | <ITREE,hypernotion,list> | <OTREE,hypernotion,list>
functor ::= production | %eq%
foreign-var ::= +metanotion | -metanotion

Notation In Prolog notation, a rule which could be written 'abc(X) :- pqr(X).' is represented as a parsed, Cambridge notation symbols string '<TERM,abc,<VAR,X>>,<TERM,pqr,<VAR,X>>'. (except the commas and other punctuation between <...> are hex escaped #hh). In these notes, TERM and VAR are omitted, and lists are indicated by [a,b,...,z] instead of <LIST,a,b,...,z>.

Rather than using traditional attribute grammars, the context constraints is handled with logic programmming techniques. More specifically the attribute processing is the resolution of one query within a database of Horn clauses, a Prolog-ish program which operates on the concrete parse tree. However unlike Prolog the order of rules cannot be significant. Also, though the not-equal relation can be used, there is no cut.

The parse tree is tree is built of terms of lists: <p , [P1 , P2 , ... , Pm ,LOC]> is the parse tree node labelled with hypernotion p , daughter parse trees P1 , P2 , and so on to Pm , and the implementation defined source location LOC. The goal, then, is given the concrete parse tree of the entire input to return the the attributed tree

%query,%start,%start,<P_0,A_0>
The P_0 tree will be bound before starting the querying, the resulting bound value of A_0, if any, is the attributed tree.


proc attributeAnalysis {grammar} {
	log Add tree constructors to grammar attribute rules.
	<WAM constant accumulation buffers>
	<Add tree constructors to grammar attribute rules>
	<The attribute tree construction QUERY>
	<The database %eq% fact>
	set database [list $QUERY $EQFACT]
	log Construct the logic machine.
	<Construct the logic machine>
}
<Display attribute rule>
<Define a WAM constant>
<WAM third level indexing>

		
 
section    top

69. Using not-equals with partially defined terms :: Because the attributes can be partially defined, it is problematic what not-equals actually means. If two partially defined terms are checked for equality, if the defined parts are the same, the undefined parts are linked so that they must also be the same; equals handles past, present, and future equality. Not-equals also checks if the defined parts of both terms are equal, but what should it do about undefined parts? Does their potential equality mean a future equality, or must the decision wait until the terms are fully defined? The latter presents a difficult backtrack problem: potentially an entire parse will have to backtrack if the first predicate is not-equals. The former means adding time to the logic, and then the orderring of clauses becomes important. The former is adopted with this caveat.

Members within an alternative are not rearranged if not-equals is used in that alternative or any predicate it calls. Not-equals is verified by a failure to unify. Fully defined different terms will not unify; partially defined terms might unify at the time of the not-equals, even though subsequent predicates would unify them to different terms. It is the grammar writer's responsibility to use not-equals at a time when the terms are as fully defined as the grammar needs them to be.

In a real vW2 grammar, inequality checks are on protonotions only, which correspond to fully defined terms. The correct protonotions are automagically guessed. cow grammars cannot rely on this magic and be efficient.

 
section    top

set predicaterecoveries {}
$grammar action: %end %shift := [sym nest [sym concat shift_lexeme [compileWAMConstant %end 1]  0]]
foreach {key rules} [${grammar}::filter cs: * *] {
	setp {- pn alt} $key
	if {![sym contains [${grammar}::symbols] $pn] && ![sym contains [$grammar predicates:] $pn]} {
		${grammar}::delete cs: $pn $alt
		continue
	} elseif {[string match *symbol $pn]} {
		<Define symbol shift action>
		<A database symbol fact>
	} elseif {[sym contains [$grammar predicates:] $pn]} {
		<Define immediate foreign text reduce action>
		<A database predicate rule>
	} else {
		<Define production reduce action>
		<A database production rule>
	}
}

		
 
section    top

$grammar action: $pn $alt := [sym nest reduce_parse,[compileWAMConstant $pn 1],[sym length $alt]]

		
 
section    top

if {[sym empty $alt]} {
	foreach foreign [sym decode [hr::if foreign: $pn := {}]] {
		setp {language output input code} [sym decode string $foreign]
		if {[string match *immediate $language] && [string length $code]} {
			if {![string equal $output -]} {
				report E immediate foreign text has output: $pn: $output
			}
			if {![string equal $input -]} {
				report E immediate foreign text has input: $pn: $input
			}
			if {[string match *immediate $language]} {
				set language [string range $language 0 end-[string length immediate]]
			}
			symr0 action: $pn $alt := [sym nest [sym encode string foreign_immediate $language $code]]
		}
	}
}

		
 
section    top
 
section    top

set rule [sym choose $rules]
set head [sym unnest [sym first $rule]]
set symbol [sym index $head 1]
set attribute [sym range $head 2 end]
$grammar action: $pn %shift := [sym nest [sym concat shift_lexeme \
		[compileWAMConstant $symbol 1] \
		[expr {![sym empty $attribute] || [sym contains [hr::if namedsymbols: := {}] $symbol]}] \
]]

		
 
section    top
 
section    top

foreach {key rules} [${grammar}::filter cs: * *] {
	setp {- pn alt} $key
	if {![sym contains [${grammar}::nonterminals] $pn] && ![sym contains [$grammar predicates:] $pn]} continue
	sym each rule $rules {lappend database $rule}
}

<Add rewrite rules to the logic database>
<Add predicate recovery rules>
<Add symbol table queries>

<WAM code accumulation buffer>
<WAM query variable list>
<WAM code symbol table>
<Compile rules database into WAM>
<Index the WAM machine>
<Link WAM calls to entry points>
<Section the WAM machine>
<Generate the WAM machine>

		
 
section    top

foreach clause $database {
	set variableAssignments {}
	if {[sym length $clause]==1} {
		<Compile a FACT into WAM>
	} elseif {[string equal [sym first $clause] %query] && [sym length $clause]>=3} {
		<Compile a QUERY into WAM>
	} else {
		<Compile a RULE into WAM>
	}
	<Identify permanent variables in a WAM rule>
	set allocate [llength $permanent0]
	if {$allocate==0 && [llength $tail]<=1} {set allocate -1}
	set deallocate -1
	<Identify useless variables in a WAM rule>
	if {[sym length $head]} {
		<Compile a WAM rule head>
		set allocate -1
	}
	if {[sym length $tail]} {
		<Compile a WAM rule tail>
	} else {
		<Generated command: proceed>
	}
	if {[string equal [sym first $clause] %query] && [sym length $clause]>=3} {
		<Link the QUERY to rest of the world>
	}
}

		
 
section    top
			
lappend C fail

		
 
section    top
			
lappend C pass

		
 
section    top
			
lappend C proceed

		
 
section    top

81. A database production rule :: The vW1 separator writes the attribute evaluator for an alternative pi of the production p as

p: r1, ... , rn
<p,F1,...,Fm>
<r1,...,V_1k,...>,
... ,
<r_n,...,V_nl,...>
The concrete parse and attributed parse trees are created by adding variables and terms to analyze the concrete parse tree P and synthesise the attributed tree A.
<p,
[<r1,P1>,<r2,P2>,...,<rn,Pn>,LOC],
<p,F1,...,Fm,[A1,A2,...,An,LOC]>,
F1,...,F_m>,
<r1,P1,A1,...,V1k,...>,
...
<rn,Pn,An,...,Vnl,...>,
LOC is the source file location, encoded in an implementor specific fashion.

(It would seem that the individually alternatives could be indexed explicitly, something like

<pi+r1+r2+...+rn,
[<r1,P1>,<r2,P2>,...,<rn,Pn>,LOC],
<p,F1,...,Fm,[A1,A2,...,An,LOC]>,
F1,...,F_m>,
<r1,P1,A1,...,V1k,...>,
...
<rn,Pn,An,...,Vnl,...>,
and avoid having to choose which alternative. But then the choice gets pushed into the ri terms, where instead of single functor, all the possible parses of ri+t1+t2+ have to be considerred, with combinatorial explosion in the number of disjunctions. Instead of that indexing can be done implicitly as part of the logic machine optimisation.)

Predicates are deleted from the context free grammar, and so do not show up in the parse tree, the Pi and Ai parameters, but they are still present in the attribute analysis. Predicates have a different editted calling sequence: they receive the parse tree location and name of calling production instead of parse trees. These parameters are discussed next.

...
<rn,...,Vnl,...,caller,LOC>,
...

Symbol have no parse tree, but they still have LOC; the calling sequence is similar enough to productions that they need not be handled differently at this moment.


set n [sym length $alt]
set newrules {}
sym each rule $rules {
	set head [sym rest [sym unnest [sym first $rule]]]
	set Plist [sym nest [sym concat LIST [sym nest VAR,_LOC] NIL]]
	set Alist [sym nest [sym concat LIST [sym nest VAR,_LOC] NIL]]
	for {set i $n} {$i>=1} {incr i -1} {
		set Plist [sym nest [sym concat LIST [sym nest VAR,_P$i] $Plist]]
		set Alist [sym nest [sym concat LIST [sym nest VAR,_A$i] $Alist]]
	}
	<Generated command: term_production>
	if {[rew::if :[sym first $head]]} {
		<Redirect production tree construction to the rewriting rules>
	} else {
		set Alist [sym nest [sym concat TERM $head $Alist]]
		set newrule [sym nest [sym concat TERM [sym first $head] $Plist $Alist [sym rest $head]]]
	}
	set i 1
	sym enum term [sym rest $rule] {
		set term [sym unnest $term]
		if {[sym contains [$grammar predicates:] [sym index $term 1]]} {
			set term [sym concat \
					$term \
					[sym nest [sym encode string [list TERM "[sym index $term 1] in $pn"]]] \
					[sym nest VAR,_LOC]
			]
		} else {
			set term [sym concat \
					[sym range $term 0 1] \
					[sym nest VAR,_P$i] \
					[sym nest VAR,_A$i] \
					[sym range $term 2 end] \
			]
			incr i 
		}
		set newrule [sym concat $newrule [sym nest $term]]
	}
	set newrules [sym union $newrules $newrule]
}
$grammar cs: $pn $alt := $newrules

		
 
section    top
		
set newrules {}
set head "#NS"
sym each rule $rules {
	set head [sym unnest [sym first $rule]]
	set newrule [sym nest [sym concat \
			$head \
			[sym nest VAR,_RECOVERY] \
			[sym nest VAR,_LOC] \
	]]
	sym enum term [sym rest $rule] {
		set term [sym nest [sym concat [sym unnest $term] [sym nest VAR,_] [sym nest VAR,_]]]
		set newrule [sym concat $newrule $term]
	}
	set newrule [sym concat $newrule [sym nest SUCCESS,_RECOVERY]]
	set newrules [sym union $newrules $newrule]
}
$grammar cs: $pn $alt := $newrules
<Define predicate recovery term>

		
 
section    top

set database [concat $database $predicaterecoveries]

		
 
section    top

if {[sym length $head]>=2} {
	set n [expr {[sym length $head]-2}]
	set predicaterecovery [sym range $head 0 1]
	for {set i 2} {$i<$n} {incr i} {
		set predicaterecovery [sym concat $predicaterecovery [sym nest VAR,_]]
	}
	set predicaterecovery [sym concat $predicaterecovery [sym nest VAR,_RECOVERY]]
	set predicaterecovery [sym concat $predicaterecovery [sym nest VAR,_LOC]]
	set predicaterecovery [sym concat [sym nest $predicaterecovery] [sym nest FAILURE,_RECOVERY,_LOC]]
	lappend predicaterecoveries $predicaterecovery
}

		
 
section    top

set head [sym first $clause]
set tail [sym rest $clause]
set queryvar 0

		
 
section    top

set QUERY %query,%start,[sym nest TERM,%start,[sym nest VAR,P_0],[sym nest VAR,A_0]]

		
 
section    top

set head "#NS"
set queryname [sym index $clause 1]
set tail [sym range $clause 2 end]
if {[string equal $queryname %start]} {
	set querystart [llength $C]
	set Y(a.[llength $C]) %query
} else {
	<Record the subquery address>
}

		
 
section    top

if {[string equal $queryname %start]} {
	<Generated command: query_start>
} else {
	<Generated command: subquery_start>
	hr symbol-table: $SYMTAB := [
		assoc update [hr symbol-table: $SYMTAB] $objkey wamcode {
			set wamcode [list $wamcode $variableAssignments]
		}
	]
}
<Generated command: pass>

		
 
section    top
			
lappend C [list query_start C.$querystart $variableAssignments]

		
 
section    top
			
lappend C [list subquery_start $queryname C.$querystart $variableAssignments]

		
 
section    top

set EQFACT [sym nest TERM,%eq%,[sym nest VAR,ATTRIBUTE],[sym nest VAR,ATTRIBUTE],[sym nest VAR,_]]

		
 
section    top
		
set newrules {}
sym each rule $rules {
	set head [sym unnest [sym first $rule]]
	set symbol [sym index $head 1]
	set attribute [sym range $head 2 end]
	if  {[sym empty $attribute] && [sym contains [hr::if namedsymbols: := {}] $symbol]} {
		set attribute [sym nest VAR,_NAME]
	}
	set core [sym nest LIST,[sym nest VAR,_LOC],NIL]
	if {![sym empty $attribute]} {
		set core [sym nest LIST,$attribute,$core]
		gen term_production $symbol 2
	}
	set newrules [sym concat $newrules [
		sym nest [sym concat \
			TERM \
			$symbol \
			$core \
			[sym nest [sym concat TERM $symbol $attribute $core]] \
			$attribute \
		] \
	]]
}
$grammar cs: $pn $alt := $newrules

		
 
section    top

set head [sym first $clause]
set choices "#NS"
set queryvar 0

		
 
section    top

94 :: The memory used by the WAM is divided into eight zones: integers, registers, constant, heap, list, environment stack, trailback, and foreign. The operand in WAM instructions are prefixed with a letter and full stop which indicate which zone the operand is in.

J
Integer. No memory is actually allocated in this zone: the offset is the integer. It encodes negative and positive integers.
X and A
Global register. Offset i is the register A.i or X.i.
C
Constant arity. A constant c/0 or structure functor f/n is encoded in the C zone with offset i: the value at C.i is the arity, 0 or n, and F[i] is the string, c or f. The WAM does not itself interrogate the F vector, but this is needed to construct a readable representation from the heap. After machine intialisation F and the C zone are not modified by the machine.
Constant operands are written C.i.f [ /n ] . C.i is the tagged offset; the remainder is presented to document the code.
H
The heap. If the cell is the start of a structure, its tag has the modifier bit set.
list
The heap again. This is mapped into the H zone. Offsets to list cells are tagged to be in the list zone instead of heap zone (for structure pointers and ref cells) because of how tag bits are allocated.
S and Y
Environment stack. Y.i is a permanent register, relative to the E register (environment stack pointer).
trail
The trail stack.
foreign
Foreign. WAM does not create or interpret the foreign zone. This zone can be used to store foreign values and give them a tagged address.
L or R
WAM instruction are not stored in the memory and not in any zone. Instruction offsets are labelled L.o [ .f/n ] or R.o where o is the code offset. A procedure name may be included as commentary. Some addresses must be somehow tagged because they are stored in the environment stack; these are the offset labelled with R instead of L. How the addresses are encoded depends upon the implementation.
 
section    top

<Convert to a parse tree to a top down flattenned list>
set a 1
set o [expr {[lindex $flattenned 1]+1}]
array set map {}; array unset map *
array set pam {}; array unset pam *
array set def {}; array unset def *
set permanent {}
foreach p $permanent0 {
	setp {v client} $p
	lappend permanent $v
}
foreach {d n f} $flattenned {
	if {$d==0} {
		if {[info exists Y(e.$f/$n)]} {
			if {[info exists Y(t.$f/$n)]} {
				lset C $Y(t.$f/$n) [list retry_me_else [llength $C]]
			} else {
				lset C $Y(e.$f/$n) [list try_me_else [llength $C] $n]
			}
			set Y(t.$f/$n) [llength $C]
			<Generated command: trust_me>
		} else {
			set Y(e.$f/$n) [llength $C]
			set Y(a.[llength $C]) $f/$n
			<Generated command: nop>
		}
		if {![info exists Y(f.$f/$n)]} {set Y(f.$f/$n) {}}
		if {$allocate>=0} {
			<Generated command: allocate>
		}
	} elseif {$n==-1} {
		if {[string equal $f _] || [lsearch -exact $useless $f]>=0} {
			if {$d==1} {
				incr a
			} else {
				<Generated command: unify_void>
			}
		} elseif {[info exists map($f)]} {
			if {$d==1} {
				<Generated command: get_value>
				incr a
			} else {
				if {[info exists def($map($f))] && $def($map($f))==2} {
					<Generated command: unify_value>
				} else {
					set def($map($f)) 2
					if {[string match Y* $map($f)]} {
						error "unify_local_value $map($f) = $f"
					}
					<Generated command: unify_local_value>
				}
			}
		} else {
			set i [lsearch -exact $permanent $f]
			if {$i>=0} {
				incr i
				set map($f) Y.$i
			} else {
				set map($f) X.$o
				incr o
			}
			if {$d==1} {
				<Generated command: get_variable>
				incr a
			} else {
				set def($map($f)) 2
				set tr $map($f)
				<Generated command: unify_variable>
			}
		}
	} elseif {$n==0} {
		if {$d==1} {
			set r $a; incr a
			<Generated command: get_constant>
		} else {
			<Generated command: unify_constant>
		}
	} else {
		if {$d==1} {
			set r $a; incr a; set tr A.$r
		} else {
			set r $o; incr o; set tr X.$r
		}
		if {$d>1} {
			set def($tr) 2
			<Generated command: unify_variable>
		}
		if {[string equal $f LIST]} {
			<Generated command: get_list>
		} else {
			<Generated command: get_structure>
		}
	}
}

		
 
section    top
			
lappend C [list allocate $allocate]

		
 
section    top
			
lappend C [list get_constant [compileWAMConstant $f 0] A.$r]

		
 
section    top
			
lappend C [list get_list $tr]

		
 
section    top
			
lappend C [list get_structure [compileWAMConstant $f $n] $tr]

		
 
section    top
			
lappend C [list get_value $map($f) A.$a]

		
 
section    top
			
lappend C [list get_variable $map($f) A.$a]

		
 
section    top
			
lappend C nop

		
 
section    top
 
section    top
			
lappend C trust_me

		
 
section    top
 
section    top
			
lappend C [list unify_constant [compileWAMConstant $f 0]]

		
 
section    top
			
lappend C [list unify_local_value $map($f)]

		
 
section    top
			
lappend C [list unify_value $map($f)]

		
 
section    top
			
lappend C [list unify_variable $tr]

		
 
section    top
			
lappend C [list unify_void 1]

		
 
section    top

set flattenned {}
vprioq new work 0 [list 0 $head]
while {[vprioq dequeue work task est; set est]} {
	setp {d list} $task
	if {[sym nested $list]} {set list [sym unnest $list]}
	switch [sym first $list] {
		TERM {
			set functor [sym index $list 1]
			set terms [sym range $list 2 end]
			lappend flattenned $d [sym length $terms] $functor
			incr d
			while {![sym empty $terms]} {
				vprioq enqueuebefore work [list $d [sym last $terms]]
				set terms [sym front $terms]
			}
		}
		VAR {
			lappend flattenned $d -1 [sym last $list]
		}
		NIL {
			lappend flattenned $d 0 %nil
		}
		LIST {
			set list [sym rest $list]
			lappend flattenned $d 2 LIST
			incr d
			while {![sym empty $list]} {
				vprioq enqueuebefore work [list $d [sym last $list]]
				set list [sym front $list]
			}
		}
	}
}

		
 
section    top

array set map {}
array set pam {}
array set def {}
set m 0; set le0 [llength $permanent0]
set trim [expr {$deallocate>=0}]
sym enum term $tail {
	incr m
	<Convert to a parse tree to a bottom up flattenned list>
	set a 1
	set o [expr {[lindex $flattenned end-1]+1}]
	set R {}
	set permanent {}
	set unsafe {}
	set le 0
	foreach p $permanent0 {
		setp {v client} $p
		if {$m<=$client} {lappend permanent $v}
		if {$m<$client} {incr le} else {lappend unsafe $v}
	}
	if {$allocate>=0} {
		<Generated command: allocate>
		set allocate -1
	}
	foreach {d n f} $flattenned {
		if {$d==0 && $n>=0} {
			if {$m==[llength $tail]} {
				if {$deallocate>=0} {
					<Generated command: deallocate>
					set deallocate -1
				}
				lappend Y(f.$f/$n) [llength $C]
				<Generated command: call>
			} else {
				if {$trim && $le<$le0} {
					<Generated command: trim_environment>
				}
				lappend Y(f.$f/$n) [llength $C]
				<Generated command: call>
			}
		} elseif {$n==-2} {
			<Generated command: anti_get_value>
			if {$m==[llength $tail]} {
				if {$deallocate>=0} {
					<Generated command: deallocate>
					set deallocate -1
				}
				<Generated command: proceed>
			} elseif {$trim && $le<$le0} {
				<Generated command: trim_environment>
			}
		} elseif {$n==-3} {
			<Generated command: on_failure>
			if {$m==[llength $tail]} {
				if {$deallocate>=0} {
					<Generated command: deallocate>
					set deallocate -1
				}
				<Generated command: proceed>
			} elseif {$trim && $le<$le0} {
				<Generated command: trim_environment>
			}
		} elseif {$n==-4} {
			setp {language no ni varnames foreigntext} $f
			<Generated command: foreign_text>
			if {$m==[llength $tail]} {
				if {$deallocate>=0} {
					<Generated command: deallocate>
					set deallocate -1
				}
				<Generated command: proceed>
			} elseif {$trim && $le<$le0} {
				<Generated command: trim_environment>
			}
		} elseif {$n==-5} {
			set i [expr {$a-1}]
			<Generated command: if_bound>
		} elseif {$n==-6} {
			<Generated command: on_success>
			if {$m==[llength $tail]} {
				if {$deallocate>=0} {
					<Generated command: deallocate>
					set deallocate -1
				}
				<Generated command: proceed>
			} elseif {$trim && $le<$le0} {
				<Generated command: trim_environment>
			}
		} elseif {$n==-1} {
			if {[string equal $f _]} {
				if {$d==1} {
					<Generated command: put_void>
					incr a
				} else {
					lappend R _
				}
			} elseif {[info exists map($f)]} {
				if {$d==1} {
					set i [lsearch -exact $unsafe $f]
					if {$i>=0} {
						set unsafe [lreplace $unsafe $i $i]
						<Generated command: put_unsafe_value>
					} else {
						<Generated command: put_value>
					}
					set def(A.$a) 1
					set pam(A.$a) $f
					incr a
				} else {
					lappend R $map($f)
				}
			} else {
				set i [lsearch -exact $permanent $f]
				if {$i>=0} {
					incr i
					set map($f) Y.$i
					set pam(Y.$i) $f
				} else {
					set map($f) X.$o
					set pam(X.$o) $f
					incr o
				}
				if {$d==1} {
					<Generated command: put_variable>
					lappend variableAssignments $f $map($f)
					set def(A.$a) 2
					set pam(A.$a) $f
					if {[string match Y.* $map($f)]} {
						set def($map($f)) 1
					} else {
						set def($map($f)) 2
					}
					incr a
				} else {
					lappend R $map($f)
					set def($map($f)) 0
				}
			}
		} elseif {$n==0} {
			if {$d==1} {
				set r $a; incr a
				<Generated command: put_constant>
			} else {
				lappend R [compileWAMConstant $f 0]
			}
			
		} else {
			set i [expr {[llength $R]-1}]
			set j [expr {[llength $R]-$n}]
			set R0 [lrange $R $j $i]
			set R [lrange $R 0 [expr {$j-1}]]
			if {$d==1} {
				set r $a; incr a; set tr A.$r
			} else {
				set r $o; incr o; set tr X.$r
			}
			lappend R $tr
			set def($tr) 1
			set pam($tr) -
			if {[string equal $f LIST]} {
				<Generated command: put_list>
			} else {
				<Generated command: put_structure>
			}
			foreach r $R0 {
				if {[string equal $r _]} {
					<Generated command: set_void>
				} elseif {[string match C.* $r]} {
					<Generated command: set_constant>
				} elseif {$def($r)==2} {
					<Generated command: set_value>
				} elseif {$def($r)} {
					set def($r) 2
					if {[string match Y* $r]} {error "set_local_value $r"}
					<Generated command: set_local_value>
				} else {
					set def($r) 2
					<Generated command: set_variable>
					if {![string equal $pam($r) -]} {
						lappend variableAssignments $pam($r) $r
					}
				}
			}
		}
	}
	foreach {v x} [array get map] {
		if {![string match Y.* $x]} {
			unset map($v)
			if {[info exists pam($x)]} {
				unset pam($x)
				unset def($x)
			}
		}
	}
	set le0 $le
}
if {$deallocate>=0} {
	<Generated command: deallocate>
}

		
 
section    top
			
lappend C [list anti_get_value A.1 A.2]

		
 
section    top
			
lappend C [list call $f/$n $n 0]

		
 
section    top
			
lappend C deallocate

		
 
section    top
			
lappend C [list execute $f/$n 0 $n]

		
 
section    top

117. Generated command: foreign_text :: foreign_text language num-outputs num-inputs variable-names foreign-text

Evaluate the foreign text. The input variables are guarenteed to be bound; the output variables may or may not be bound. The implementation is expected to provide a language specific interface to extract input values from the heap, unify output variables to the heap, and to signal success or failure. cow cannot verify that interface, but perhaps the implementation of the foreign_text can. The implementation is expected to signal success to the WAM in the same manner as a get_value command.
The language is either '-' or small marks. It is passed through to the foreign_text command without interpretation. All symbol table generated foreign text have a language '%symbol': <%symbol table foreign text inserted in the grammar>. Other languages are between the foreign text implementation and grammar writer.
num-outputs ::= integer
Number of output variables, m. The ref cells or bound values are stored in A.1 through A.m before evaluating the foreign text.
num-inputs ::= integer
Number of input variables, n. Bound values are stored in A.m+1 through A.m+n before evaluating the foreign text.
variable-names ::= {metanotion...}
m+n variable names; they are not checked for replication. These names are available to the implementing the foreign text implementation.
foreign-text ::= language-specific-text
The language specific code to be evaluated.

if {[string equal $language %symbol]} {
	<ref tag='%symbol table foreign text inserted in the grammar'/>
} else {
	lappend C [list foreign_text $language $no $ni $varnames $foreigntext]
}

		
 
section    top
			
lappend C [list if_bound A.$i]

		
 
section    top
			
lappend C [list on_failure A.1 A.2]

		
 
section    top
			
lappend C [list on_failure A.1]

		
 
section    top
			
lappend C [list put_constant [compileWAMConstant $f 0] A.$r]

		
 
section    top
			
lappend C [list put_list $tr]

		
 
section    top
			
lappend C [list put_structure [compileWAMConstant $f $n] $tr]

		
 
section    top
			
lappend C [list put_unsafe_value $map($f) A.$a]

		
 
section    top
			
lappend C [list put_value $map($f) A.$a]

		
 
section    top
			
lappend C [list put_variable $map($f) A.$a]

		
 
section    top
			
lappend C [list put_void A.$a]

		
 
section    top
			
lappend C [list set_constant $r]

		
 
section    top
			
lalappend C [list set_local_value $r]

		
 
section    top
			
lappend C [list set_value $r]

		
 
section    top
			
lappend C [list set_variable $r]

		
 
section    top
			
lappend C [list set_void 1]

		
 
section    top
			
lappend C [list trim_environment $le]

		
 
section    top

set flattenned {}
vprioq new work 0 [list 0 0 $term]
while {[vprioq dequeue work task est; set est]} {
	setp {out d list} $task
	if {[sym nested $list]} {set list [sym unnest $list]}
	if {$out} {
		setp {d n f} $list
		lappend flattenned $d $n $f
	} else {
		switch [sym first $list] {
			TERM {
				set functor [sym index $list 1]
				set terms [sym range $list 2 end]
				vprioq enqueuebefore work [list 1 0 [list $d [sym length $terms] $functor]]
				incr d
				while {![sym empty $terms]} {
					vprioq enqueuebefore work [list 0 $d [sym last $terms]]
					set terms [sym front $terms]
				}
			}
			VAR {
				lappend flattenned $d -1 [sym last $list]
			}
			NIL {
				lappend flattenned $d 0 %nil
			}
			LIST {
				set list [sym rest $list]
				vprioq enqueuebefore work [list 1 0 [list $d 2 LIST]]
				incr d
				while {![sym empty $terms]} {
					vprioq enqueuebefore work [list 0 $d [sym last $terms]]
					set terms [sym front $terms]
				}
			}
			ANTIUNIFY {
				set list [sym rest $list]
				vprioq enqueuebefore work [list 1 0 [list $d -2 ANTIUNIFY]]
				incr d
				while {![sym empty $terms]} {
					vprioq enqueuebefore work [list 0 $d [sym last $terms]]
					set terms [sym front $terms]
				}
			}
			FAILURE {
				set list [sym rest $list]
				vprioq enqueuebefore work [list 1 0 [list $d -3 FAILURE]]
				incr d
				while {![sym empty $terms]} {
					vprioq enqueuebefore work [list 0 $d [sym last $terms]]
					set terms [sym front $terms]
				}
			}
			SUCCESS {
				set list [sym rest $list]
				vprioq enqueuebefore work [list 1 0 [list $d -6 SUCCESS]]
				incr d
				while {![sym empty $terms]} {
					vprioq enqueuebefore work [list 0 $d [sym last $terms]]
					set terms [sym front $terms]
				}
			}
			FOREIGN {
				set code [sym last $list]
				set language [sym index $list 1]
				set list [sym range $list 2 end-1]
				set vars {}
				set d0 $d; incr d
				set m 0
				set n 0
				sym enum var $list {
					if {[string match +* $var]} {
						incr m
						set var [string range $var 1 end]
						lappend vars $var
						lappend flattenned $d -1 $var
					}
				}
				foreach var $list {
					if {[string match -* $var]} {
						incr n
						set var [string range $var 1 end]
						lappend vars $var
						lappend flattenned $d -1 $var
						lappend flattenned $d -5 $var
					}
				}
				lappend flattenned $d0 -4 [list $language $m $n $vars $code]
			}
		}
	}
}

		
 
section    top

set MZ {
	{initialise_memory C.0 J.0}
	{initialise_memory C.1 J.0}
	{initialise_constant C.0 %nil 0}
	{initialise_constant C.1 %undefined 0}
}
set F {%nil %undefined}

		
 
section    top

proc compileWAMConstant {f arity} {
	upvar 1 F F MZ MZ Y Y
	if {![info exists Y(c.$f/$arity)]} {
		set Y(c.$f/$arity) [llength $F]
		<Generated command: initialise_memory>
		<Generated command: initialise_constant>
		lappend F $f
	}
	if {$arity} {
		return [list C.$Y(c.$f/$arity).$f/$arity]
	} else {
		return [list C.$Y(c.$f/$arity).$f]
	}
}

		
 
section    top
			
gen term_vars $pn/[sym length $vars] $vars

		
 
section    top
			
gen term_production [sym first $head] [sym length $head]

		
 
section    top
			
lappend MZ [list initialise_constant C.[llength $F] [list $f] $arity]

		
 
section    top
			
lappend MZ [list initialise_memory C.[llength $F] J.$arity]

		
 
section    top

set Q {}

		
 
section    top

142. WAM code accumulation buffer :: Instruction are accumulated in the list variable C, which just glumphs together instructions as they come. It would be nice to have some kind peephole optimiser or better register allocator; at some point I plan to do a backend as cow does the front end: rather than try to do that on the fly here, this whole issue will be deferred until then.

The offset 0 instruction fail becomes the target of any call to an undefined procedure. The offset 1 instruction pass is the return address (CP) of the query. When the query completes, this triggers the output of query results.

The WAM start is given in the query_start instruction.

Once an instruction is added to C, that position in the list will not be changed and can be used as a target for calls, executes, and other transfers of control. Sometimes instructions are patched as the control structure is established, but code is never moved once placed.


set C {fail pass}

		
 
section    top

array set Y {c.%nil/0 0 c.%undefined/0 0}

		
 
section    top

unset -nocomplain peuse
unset -nocomplain pecount
unset -nocomplain peclient
array set peuse {}
array set pecount {}
array set peclient {}
vprioq new work
set i 1
if {[sym length $head]} {vprioq enqueue work [list 1 $head]}
sym enum term $tail {
	vprioq enqueue work [list $i $term]
	incr i
}
while {[vprioq dequeue work task est; set est]} {
	setp {i list} $task
	if {[sym nested $list]} {set list [sym unnest $list]}
	switch [sym first $list] {
		VAR {
			set var [sym last $list]
			lappend peuse([list $var $i]) $i
			set pecount($var) 0
		}
		TERM {
			sym enum term [sym range $list 2 end] {vprioq enqueue work [list $i $term]}
		}
		LIST - ANTIUNIFY - FAILURE - SUCCESS {
			sym enum term [sym range $list 1 end] {vprioq enqueue work [list $i $term]}
		}
		FOREIGN {
			sym enum term [sym range $list 2 end-1] {
				set var [string range $term 1 end]
				lappend peuse([list $var $i]) $i
				set pecount($var) 0
			}
		}
	}
}
foreach v [array names peuse] {
	setp {v c} $v
	incr pecount($v)
	lappend peclient($v) $c
}
set permanent0 {}
foreach {v k} [array get pecount] {
	if {$k>1} {
		lappend permanent0 [list $v [lindex [lsort -integer $peclient($v)] end]]
	}
}
set permanent0 [lsort -integer -decreasing -index 1 $permanent0]


		
 
section    top

unset -nocomplain ulvar
array set ulvar {}
vprioq new work
if {[sym length $head]} {vprioq enqueue work $head}
sym enum term $tail {vprioq enqueue work $term}
while {[vprioq dequeue work list est; set est]} {
	if {[sym nested $list]} {set list [sym unnest $list]}
	switch [sym first $list] {
		VAR {
			set v [sym last $list]
			if {![info exists ulvar($v)]} {set ulvar($v) 0}
			incr ulvar($v)
		}
		TERM {
			sym enum term [sym range $list 2 end] {vprioq enqueue work $term}
		}
		LIST - ANTIUNIFY - FAILURE - SUCCESS {
			sym enum term [sym range $list 1 end] {vprioq enqueue work $term}
		}
		FOREIGN {
			sym enum term [sym range $list 2 end-1] {
				set v [string range $term 1 end]
				if {![info exists ulvar($v)]} {set ulvar($v) 0}
				incr ulvar($v)
			}
		}
	}
}
set useless {}
foreach {v k} [array get ulvar] {
	if {$k<2} {lappend useless $v}
}

		
 
section    top

foreach {procname aentry} [array get Y e.*] {
	set procname [lindex [split $procname .] 1]
	if {[string equal [lindex $C $aentry 0] nop]} {
		unset Y(a.$aentry); incr aentry; set Y(a.$aentry) $procname; set Y(e.$procname) $aentry
	}
	if {[string equal [lindex $C $aentry 0] try_me_else]} {
		<Index WAM/cow procedure>
	}
}

		
 
section    top

set arity [lindex $C $aentry 2]
if {![string length $arity]} {error null-arity}
array set procindex {V {} C {} L {} S {}}
for {set aclause $aentry} {$aclause!=0} {set aclause $anextclause} {
	if {[string equal [lindex $C $aclause 0] trust_me]} {
		set anextclause 0
	} else {
		set anextclause [lindex $C $aclause 1]
	}
	set asubclause $aclause; incr asubclause; set afirstget $asubclause
	while 1 {
		switch [lindex $C $afirstget 0] {
			allocate {
				incr afirstget
			}
			get_structure {
				lappend procindex(S) [lindex $C $afirstget 2] $asubclause
				break
			}
			get_constant {
				lappend procindex(C) [lindex $C $afirstget 2] $asubclause
				break
			}
			get_list {
				lappend procindex(L) $asubclause
				break
			}
			default {
				lappend procindex(V) $aclause
				break
			}
		}
	}
}
foreach x {C S} y {switch_on_constant switch_on_structure} {
	if {[llength $procindex($x)]==2} {
		set procindex($x) [thirdlevel [concat [lindex $procindex($x) 1] $procindex(V)]]
	} elseif {[llength $procindex($x)]==0} {
		set procindex($x) [thirdlevel $procindex(V)]
	} else {
		unset -nocomplain H
		foreach {symbol address} $procindex($x) {lappend H($symbol) $address}
		set switch {}
		foreach {symbol addresses} [array get H] {
			set addresses [thirdlevel [concat $addresses $procindex(V)]]
			lappend switch $symbol $addresses
		}
		set procindex($x) [llength $C]
		lappend C [list $y $switch]
	}
}
set procindex(L) [thirdlevel $procindex(L)]
set asegment [llength $C]
unset Y(a.$aentry); set Y(a.$asegment) $procname
set Y(e.$procname) $asegment
<Generated command: switch_on_term>

		
 
section    top

proc thirdlevel {addresses} {
	upvar 1 arity arity C C
	if {[llength $addresses]==1} {
		return $addresses
	} elseif {[llength $addresses]>1} {
		set aswitch [llength $C]
		for {set i 0; set n [llength $addresses]} {$i<$n} {incr i} {
			set a [lindex $addresses $i]
			if {$i==0} {
				<Generated command: try>
			} elseif {$i+1==$n} {
				<Generated command: trust>
			} else {
				<Generated command: retry>
			}
		}
		return $aswitch
	} else {
		return 0
	}
}

		
 
section    top
			
lappend C [list retry $a]

		
 
section    top
 
section    top
 
section    top
			
lappend C [list switch_on_term $aentry $procindex(C) $procindex(L) $procindex(S)]

		
 
section    top
			
lappend C [list trust $a]

		
 
section    top
			
lappend C [list try $a $arity]

		
 
section    top

foreach {e addr} [array get Y e.*] {
	set e [lindex [split $e .] 1]
	foreach ref $Y(f.$e) {
		lset C $ref 2 $addr
	}
}

		
 
section    top

156. Section the WAM machine :: Because of backtracking, WAM cannot always release stack frames when it returns from a procedure. This is incompatiable with the protocol stack of most other languages; hence the WAM maintains its own stack. Code addresses must be stored in thestack, which means somehow encoding and tagging the address, as described in <WAM constant accumulation buffers>.

Sectionning identifies those addresses that have to be stored in the stack and replacing them with tagged constants. Further the code of the machine is divided into sections. The first instruction is the target of a nonsequential transfer of control, such as a call or try_me or a proceed. It is followed by those instructions which execute sequentially. The section is terminated by an instruction which makes a transfer of control, or if the next instruction needs to be the start of a new section.

If the implementation language permits label variables, the label pointer can be stored in the stack (or stored in zone 7 and a tagged zone 7 address in the stack), the sections become labelled blocks, and the transfers become gotos. An alternate implementation is to make the sections into procedures and then pass the procedure addresses around. Another possibility is to use a switch instead of gotos.

	
set p 0
set sections {}
set section L.0
array set target {}
foreach instruction $C {
	switch [lindex $instruction 0] {
		switch_on_constant {
			foreach {c L} [lindex $instruction 1] {
				set target($L) 1
			}
		}
		call - execute {
			set target([lindex $instruction 2]) 1
		}
		try - retry - trust {
			set target([lindex $instruction 1]) 1
		}
	}
}
set o 0
foreach instruction $C {
	if {[info exists target($p)] && [llength $section]>1} {
		lappend section [list continue_at L.$p]
		lappend sections $section
		set section L.$p
	}
	incr o
	set p1 [expr {$p+1}]
	if {[info exists Y(a.$p)]} {
		lappend section [list entry_point $Y(a.$p)]
	}
	switch [lindex $instruction 0] {
		call {
			set instruction [
				concat call [list L.[lindex $instruction 2].[lindex $instruction 1]] [lrange $instruction 3 end] R.$p1
			]
			lappend section $instruction
			lappend sections $section
			set section L.$p1
		}
		retry - try {
			lset instruction 1 L.[lindex $instruction 1]
			lappend instruction R.$p1
			lappend section $instruction
			lappend sections $section
			set section L.$p1
		}
		execute {
			set instruction [
				concat execute [list L.[lindex $instruction 2].[lindex $instruction 1]] [lrange $instruction 3 end]
			]
			lappend section $instruction
			lappend sections $section
			set section L.$p1
		}
		trust {
			lset instruction 1 L.[lindex $instruction 1]
			lappend section $instruction
			lappend sections $section
			set section L.$p1
		}
		fail - pass - proceed {
			lappend section $instruction
			lappend sections $section
			set section L.$p1
		}
		switch_on_constant - switch_on_structure {
			set op [lindex $instruction 0]
			set hash {}
			foreach {c l} [lindex $instruction 1] {
				lappend hash $c L.$l
			}
			set instruction [list $op [list $hash]]
			lappend section $instruction
			lappend sections $section
			set section L.$p1
		}
		switch_on_term {
			setp {op v c l s} $instruction
			set instruction [list $op L.$v L.$c L.$l L.$v]
			lappend section $instruction
			lappend sections $section
			set section L.$p1
		}
		try_me_else - retry_me_else {
			lset instruction 1 R.[lindex $instruction 1]
			lappend section $instruction
		}
		default {
			lappend section $instruction
		}
	}
	set p $p1
}
set code {}
foreach section $sections {
	set section [setp name $section]
	lappend code [list BEGIN section $name]
	foreach instruction $section {
		lappend code $instruction
	}
	lappend code END
}

		
 
section    top


gen BEGIN attribute_wam [$grammar name:]
	gen BEGIN wam_initialisation
		gen initialise_register B S.0
		gen initialise_register BO S.0
		gen initialise_register CP L.1
		gen initialise_register E S.1
		gen initialise_register H H.0
		gen initialise_register HB H.0
		gen initialise_register mode READ
		gen initialise_register P L.0
		gen initialise_register S H.0
		gen initialise_register TR T.0
		foreach command $MZ {
			eval gen $command
		}
	gen END
	gen BEGIN eval_loop
		foreach instruction $code {gen $instruction}
	gen END
gen END

		
 
section    top
 
section    top
 
section    top
 
section    top
 
section    top
 
section    top
 
section    top
 
section    top
		
proc displayAttribute {parsed} {
	if {[sym nested $parsed]} {set parsed [sym unnest $parsed]}
#puts [string repeat . [info level]]$parsed
	switch [sym card $parsed] {
		0 {return "{}"}
		1 {;}
		default {
			set U ""; set sep ""
			sym each term $parsed {append U $sep[displayAttribute $term]; set sep "; "}
			return $U
		}
	}
	switch [sym first $parsed] {
		%query {
			return "[sym index $parsed 1] ?-[displayAttribute [sym range $parsed 2 end]]."
		}
		TERM {
			set args {}
			set r [sym index $parsed 1]
			sym enum arg [sym range $parsed 2 end] {
				append args ",[displayAttribute $arg]"
			}
			return <$r$args>
		}
		VAR {
			return [sym last $parsed]
		}
		LIST {
			set a [sym index $parsed 1]
			set b [sym unnest [sym index $parsed 2]]
			set a [displayAttribute $a]
			while 1 {
				switch [sym first $b] {
					LIST {
						set x [sym index $b 1]
						set b [sym unnest [sym index $b 2]]
						append a ,[displayAttribute $x]
					}
					NIL {
						set r \[$a\]
						break
					}
					default {
						set r \[$a|[displayAttribute $b]\]
						break
					}
				}
			}
			return $r
		}
		NIL {
			return {[]}
		}
		ANTIUNIFY {
			set a [sym index $parsed 1]
			set b [sym index $parsed 2]
			return "<%antiunify%,[displayAttribute $a],[displayAttribute $b]>"
		}
		FAILURE {
			return "%failure-recovery%"
		}
		SUCCESS {
			return "%success%"
		}
		FOREIGN {
			return "<%foreign%,[sym range $parsed 1 end]>"
		}
		default {
			set r ""; set sep1 ""; set sep2 " :- "
			sym enum predicate $parsed {
				if {[string equal $predicate $parsed]} {error HANG}
				append r "$sep1[displayAttribute $predicate]"
				set sep1 $sep2
				set sep2 ", "
			}
			return $r.
		}
	}
}

		
   
   

DFA Parser Generation

   
top

proc createDFA {grammar automata} {
	log Begin createDFA.
	set sn 0
	$grammar compiled: := 1
	set order [$grammar order:]
	$automata name: := [$grammar name:]
	$automata start: := %start
	$automata domains: := %start
	sym each pn [${grammar}::nonterminals] {$grammar ep: $pn := $pn}
	log Left recursion removal.
	<Left recursion removal>
	log Convert a right recursive grammar to a right linear grammar.
	<Convert a right recursive grammar to a right linear grammar>
	log Create the NFA from a factorred right linear grammar.
	<Create the NFA from a factorred right linear grammar>
	log Make the nondeterministic FA deterministic.
	<Make the nondeterministic FA deterministic>
	log Compute parse stack depth from edges if possible.
	<Compute parse stack depth from edges if possible>
	if {[$grammar compiled:]} {
		log Compile the DFA.
		compileAutomata $grammar $automata
	}
	log Completed createDFA.
}

		
 
section    top

167. Left recursion removal :: This section will not only replace left recursion with right or embedding recursion, will go all the way to create a Greibach Normal Form of the grammar. To recover the parse tree of the original grammar, reduction symbols are added to the productions.

Semantics of a production: <Define production reduce action>

The overall strategy is somewhat similar to cycle-analysis: work inward through the nesting grammars; maintain a set of derivations that are expanded and manipulated until all derivations are left recursion free. The left recursion removal algorithm is derived from Aho and Ullman algorithm 2.13.

Rewrite the grammar to order the nonterminals Ai , so that in each of their productions Ai→α , α begins with a terminal or Aj, j&gt;i . If an original production is AiAjα, j&lt;i , substitute for Aj its left recursion free alternatives.
Partition the productions AiAiα1|Aiα2|⋯|Aiαm12|⋯|βp where each βj begins with a terminal or Ak , k&gt;i
Replace with A→β12|⋯|βp1AR2AR|⋯|βpAR
And add the new production AR→α12|⋯|αm1AR2AR|⋯|αmAR


set leftRecursionRemoved 0
log Initialise left recursion removal.
<Initialise left recursion removal>
log Empty generation removal.
<Empty generation removal>
log Iterate to remove all left recursion.
<Iterate to remove all left recursion>
log Expand initial nonterminals into terminals.
<Expand initial nonterminals into terminals>
log Rewrite the grammar without left recursion.
<Rewrite the grammar without left recursion>
log Left recursion removal complete.
if {$leftRecursionRemoved} {
	report I "Left recursion removed grammar"
	sym each pn [${grammar}::nonterminals] {
		report I "  [string map {#2c , #2e . #7c |} $pn]:"
		sym each alt [$grammar $pn] {
			report I "    [string map {#2c , #2e . #7c |} $alt]"
		}
	}
}

		
 
section    top

168. Initialise left recursion removal :: Before deforming the grammar, ultimately ending up in a right linear form, the original grammar is marked with semantic symbols that will create the parse tree according to the original grammar. Semantics symbol are quasi-terminals which are always empty (weight is zero).

If the grammar has no right recursion, a reduction symbol is appended to each production. If the production is not recursive

A→αρA
the reduction will remain at the end, after rest of the production is recognised.

If the production is left recursive

AAαρA|βρA
it will be transformed into
A→βρA|βρAAR
AR→αρA|αρAAR
and the reduction symbol still occurs after the reduced to terminals have been read and does not interfere in the right recursive nature of AR . Because the reduction occurs before the state machine loops back, it turns out the parse stack and shift queue maximum sizes can be computed from the state machine. The number of nodes in the parse tree is linear in the size of input, but all other data structures will be constant size for all possible inputs.

If the production is right recursive, simply appending a reduce

A→αAρA|βρA
makes the right recursion embedding, and no longer a type 2 grammar. Instead a different kind of reduction symbol is prefixed to the production
A→piAAαA|βρA
Because this kind of push reduction occurs within the recursion, if the state machine loops, it will have to pile up a bunch of push symbols on the parse stack and only resolve them on exitting the loop; the parse stack is now linear in the size of input instead of fixed size. Also because the push occurs before any terminals, it will actually have to migrate right into the production in order to expose terminals; this means the shift queue size also becomes linear in the size of input instead of fixed size.

When right and left recursion are mixed, the semantics of stack and shift queue are alterred while there is push symbol on the stack. To get the correct semantics for left recursive productions within right recursion, and begin semantic symbol is pushed to indicate left recursion semantics, and that is popped when an end symbol is seen. This means the stack and shift queue will fluctuate between left and right recursive semantics based on the top symbol of the stack. Begin symbols have the same unfortunate effect on stack and queue sizes as pushes.

The productions are also numberred at this time to determine an orderring during left recursion removal.


vprioq new work
set i 1
set "LR(#NS)" 0
set anypush 0
foreach {scc N} $order {
	sym each v $scc {
		if {$N} {
			sym each alt [$grammar $v] {
				if {[${grammar}::if action: $v $alt]} {
					set n [sym length $alt]
					if {$n==0} {
						;
					} elseif {[sym contains $scc [sym last $alt]]} {
						set anypush 1
					} elseif {[$grammar anyright:]} {
						set anypush 1
					} else {
						set P($v) [sym union $P($v) [sym concat $alt [sym nest reduce,$action,$n]]]
					}
				}
			}
		} else {
			set LR($v) 0
		}
	}
}
foreach {scc N} $order {
	sym each v $scc {
		if {$N} {
			set LR($v) $i; incr i
			set P($v) {}
			sym each alt [$grammar $v] {
				if {[${grammar}::if action: $v $alt]} {
					set action [$grammar action: $v $alt]
					if {$anypush} {
						set action [sym nest [sym concat series $action [sym nest partial_shift]]]
					}
					set n [sym length $alt]
					if {$n==0} {
						$grammar nullable: [sym nest reduce,$action,0] := 1
						$grammar weight: [sym nest reduce,$action,$n] := 0
						set P($v) [sym union $P($v) [sym nest reduce,$action,0]]
					} elseif {[sym contains $scc [sym last $alt]]} {
						$grammar nullable: [sym nest push,$action,$n] := 1
						$grammar weight: [sym nest push,$action,$n] := 0
						set P($v) [sym union $P($v) [sym concat [sym nest push,$action,$n] $alt]]
					} elseif {[$grammar anyright:]} {
						$grammar nullable: [sym nest begin] := 1
						$grammar weight: [sym nest begin] := 0
						$grammar nullable: [sym nest end,$action] := 1
						$grammar weight: [sym nest end,$action] := 0
						set P($v) [sym union $P($v) [sym concat [sym nest begin] $alt [sym nest end,$action]]]
					} else {
						$grammar nullable: [sym nest reduce,$action,$n] := 1
						$grammar weight: [sym nest reduce,$action,$n] := 0
						set P($v) [sym union $P($v) [sym concat $alt [sym nest reduce,$action,$n]]]
					}
				} elseif {$anypush} {
					set partialshift [sym nest partial_shift]
					$grammar nullable: [sym nest reduce,$partialshift,$n] := 1
					$grammar weight: [sym nest reduce,$partialshift,$n] := 0
					set P($v) [sym union $P($v) [sym concat $alt [sym nest reduce,$partialshift,$n]]]
				} else {
					set P($v) [sym union $P($v) $alt]
				}
			}
		} else {
			set LR($v) 0
		}
	}
}
$automata anypartialps: := $anypush

		
 
section    top

169. Empty generation removal :: Remove empty productions A→ε from the grammar.

The empty elimination algorithm is derived from Aho and Ullman algorithm 2.10.

For a production
A→α1B1α2B2Bnαn+1
where each of the Bi is nullable, then add the productions
For a production A→α1X1α2X2Xnαn+1
where XiBi or Xi=ε for all combinations.
Eliminate empty productions A→ε .

Because empty production in cow can actually be reduction symbols that cannot be discarded, this is adapted to first replace Xi with BE instead of ε immediately. This will expose all empty production semantics as productions with only all semantic and BE symbols. Then all BE symbols everywhere are iteratively replaced with their semantics. This way all reductions to empty productions are preserved, but the empty productions themselves are gone.

The grammars are always augmented with a start production and an end symbol that does not occur anywhere else: all grammars are nonempty. If the unaugmented grammar is empty, the augmented grammar still has one terminal symbol, the end symbol.


foreach {v P1} [array get P] {
	set P($v) {}
	set NU($v) {}
	sym each a $P1 {
		set d "#NS"
		sym enum s $a {
			if {[sym contains [${grammar}::nonterminals] $s] && [$grammar nullable: $s]} {
				$grammar weight: $s%E := 0
				$grammar nullable: $s%E := 1
				if {[$grammar empty: $s]} {
					set s $s%E
				} else {
					set s [sym union $s $s%E]
				}
			}
			set d [sym concat $d $s]
		}
		sym each s $d {
			if {[$grammar empty: $s]} {
				set NU($v) [sym union $NU($v) $s]
			} else {
				set P($v) [sym union $P($v) $s]
			}
		}
	}
	if {[sym null NU($v)]} {set NU($v) "#NS"}
}
foreach v [array names P] {
	sym each a $P($v) {
		set task [sym label [sym label $a string from $v] string original $v]
		vprioq enqueue work $task
	}
	set P($v) {}
}
while {[vprioq dequeue work a est; set est]} {
	set v [sym label $a string from]
	set ov [sym label $a string original]
	set i [lsearch [sym decode string $a] *%E]
	if {$i<0} {
		set P($v) [sym union $P($v) $a]
		$grammar ep: $v +:= $ov
	} else {
		set s [sym index $a $i]; set nu [string range [sym unlabel $s] 0 end-2]
		set ov [sym union $ov $nu]
		set eused [sym label $a $i eused]
		if {[sym contains $eused $s]} {
			$grammar compiled: := 0
			report E ambiguous use of null production: $nu
		} else {
			set eused [sym union $eused $s]
			sym each nua $NU($nu) {
				for {set j 0} {$j<[sym length $nua]} {incr j} {set nua [sym label $nua $j eused $eused]}
				sym each expanded [sym concat [sym range $a 0 [expr {$i-1}]] $nua [sym range $a [expr {$i+1}] end]] {
					set expanded [sym label $expanded string from $v]
					set expanded [sym label $expanded string original $ov]
					vprioq enqueue work $expanded
				}
			}
		}
	}
}
foreach v [array names P] {
	if {[sym null $P($v)]} {
		set P($v) "#NS"
	}
}
unset -nocomplain NU

		
 
section    top

170. Iterate to remove all left recursion :: The first part is to iterate until all alternatives of this nonterminal begin with a terminal or a nonterminal that will be subsequently dealt with. This means any nonterminals dealt with are already left recursion free, and so their alternatives begin with a terminal or subsequent nonterminal; this means substitutions produce initial terminals or strictly monotonic initial nonterminals.

The second part partitions the production and produces the left recursion free form.

For mutually recursive productions such as

A1A2a|b
A2A1c|d
A1 is apparently left recursion free and unalterred. For A2 , first the lesser productions are substituted
A2A2ac|bc|d
which makes the left recursion explicit. It is then partitioned
A2A2α112 where α1ac , β1bc , and β2d
and rewritten
A2bc|d|bcA2R|dA2R
with the new production
A2Rac|acA2R

If AAβ∧β*ε then A+A , the production is cyclic (and ambiguous) and rejected.

Later on the first production will be transformed to

A1bca|da|bcA2Ra|dA2Ra|b


foreach {scc N} $order {
	sym each v $scc {
		if {$N} {
			set P1 $P($v)
			set P($v) {}
			while {![sym null $P1]} {
				set P2 {}
				sym each alt $P1 {
					<Split the alternative into initial semantics, first symbol, and the rest>
					if {$LR([sym unlabel $w])==0 || $LR([sym unlabel $w])>=$LR($v) || ![sym contains $scc $w]} {
						set P($v) [sym union $P($v) $alt]
					} else {
						<Expand the initial production>
					}
				}
				set P1 $P2
			}
			set alpha {}; set beta {}
			unset -nocomplain minW
			sym each alt $P($v) {
				<Split the alternative into initial semantics, first symbol, and the rest>
				if {$LR([sym unlabel $w])==$LR($v)} {
					set W [$grammar weight: $a]
					if {$W==0} {
						$grammar compiled: := 0
						report E cyclic production: $v
					} else {
						set alpha [sym union $alpha [sym concat $sems $a]]
					}
					if {$W && (![info exists minW] || $W<$minW)} {set minW $W}
				} else {
					set beta [sym union $beta $alt]
				}
			}
			if {[sym null $alpha]} {
				set P($v) $beta
			} else {
				set leftRecursionRemoved 1
				set P($v) [sym union $beta [sym concat $beta $v%R]]
				set P($v%R) [sym union $alpha [sym concat $alpha $v%R]]
				set LR($v%R) -1
				$grammar weight: $v%R := $minW
				$grammar nullable: $v%R := 0
				$grammar ep: $v%R := [$grammar ep: $v]
			}
		}
	}
}

		
 
section    top

set sems "#NS"
set w "#NS"
set a $alt
while {![sym empty $a]} {
	set w [sym first $a]; set a [sym rest $a]
	if {![sym contains [${grammar}::nonterminals] $w] && [$grammar empty: $w]} {
		set sems [sym concat $sems $w]
		set w "#NS"
	} else {
		break
	}
}

		
 
section    top

$grammar ep: $v +:= [$grammar ep: [sym unlabel $w]]
sym each wa $P([sym unlabel $w]) {
	set P2 [sym union $P2 [sym concat $sems $wa $a]]
}

		
 
section    top

foreach {v prod} [array get P] {
	set P1 $P($v)
	set P($v) {}
	while {![sym null $P1]} {
		set P2 {}
		sym each alt $P1 {
			<Split the alternative into initial semantics, first symbol, and the rest>
			if {$LR([sym unlabel $w])==0} {
				set P($v) [sym union $P($v) $alt]
			} else {
				<Expand the initial production>
			}
		}
		set P1 $P2
	}
}

		
 
section    top

foreach {v prod} [array get P] {
	$grammar $v := {}
	sym each alt $prod {$grammar $v +:= [sym unlabel $alt]}
	unset P($v)
}

		
 
section    top

175. Convert a right recursive grammar to a right linear grammar :: (At the moment, a right linear production is allowed any number of terminals before the nonterminal.) A right recursive production has the forms

Aw which right linear.
or AwB which right linear.
or AwBα , α≠ε
With the last let B→β12|⋯|βn . Create the composite nonterminal symbol [Bα] and change the production to
Aw [Bα]
and add the production
[Bα]→β1α|β2α|⋯|βnα
if it is not already defined. If B is directly recursive, each recursive production will be
B→βiα=γiBα=γi [Bα]
And similarily for indirect recursion, and eventually new no productions will be added.

What is happenning here, for something like AaBA , where B is acting like a subroutine within A , but DFA does not have a stack for this subroutine call/exit. So in a sense, the subroutine stack is turned into a stack of production names inside the composite symbol; that is the composite symbol name encodes the stack that would otherwise be needed.

It is at this point that shift semantic symbols are added. By making shifts explicit, it is possible to distinguish recognising input symbols and shifting (or pushing) them onto the parse stack. The terminal symbols that pile up in front of shifts can be considerred as lookahead. In principle, the lookahead size is unbounded. With shift explicit, initial semantic symbols can now migrate right of the first symbol as long as they remain right of the first shift. During NFA to DFA conversion, reduction and shift symbols may end up migrating further right past more terminals.


$grammar weight: [sym nest accept,0] := 0
$grammar nullable: [sym nest accept,0] := 1
$grammar %final := {}
$grammar weight: %final := 0
$grammar nullable: %final := 1
sym each v [${grammar}::nonterminals] {
	set A {}
	sym each alt [$grammar $v] {
		set a "#NS"
		sym enum sym $alt {
			set a [sym concat $a $sym]
			if {![sym contains [${grammar}::nonterminals] $sym] && ![$grammar empty: $sym]} {
				if {[${grammar}::if action: $sym %shift]} {
					set action [$grammar action: $sym %shift]
					if {$anypush} {
						set action [sym nest [sym concat series $action [sym nest partial_shift]]]
					}
					set a [sym concat $a [sym nest shift,$action]]
					$grammar weight: [sym nest shift,$action] := 0
					$grammar nullable: [sym nest shift,$action] := 1
				} elseif {$anypush} {
					set action [sym nest partial_shift]
					set a [sym concat $a [sym nest shift,$action]]
					$grammar weight: [sym nest shift,$action] := 0
					$grammar nullable: [sym nest shift,$action] := 1
				}
			}
		}
		set A [sym union $A $a]
	}
	$grammar $v := $A
}
set rightLinearChanges 0
set rlN {}
vassoc new rl splay
vprioq new work
sym each alt [$grammar %start] {
	set task [list %start $alt]
	vprioq enqueue work -id $task 0 $task
}
while {[vprioq dequeue work task est; set est]} {
	setp {pn alt} $task
	set terminalPrefix #NS
	set a $alt
	while {![sym empty $a] && ![sym contains [${grammar}::nonterminals] [sym first $a]]} {
		set terminalPrefix [sym concat $terminalPrefix [sym first $a]]
		set a [sym rest $a]
	}
	if {[sym length $a]>1} {
		set rightLinearChanges 1
		set compositeSymbol rl%[sym nest $a]
		if {[${grammar}::if complex: $compositeSymbol]} {
			set compositeSymbol [$grammar complex: $compositeSymbol]
		} else {
			$grammar complex: $compositeSymbol := rl%$sn
			set compositeSymbol rl%$sn; incr sn
			$grammar ep: $compositeSymbol := [$grammar ep: $pn]
		}
		set firstNonterminalOfComposite [sym first $a]
		set prodOfFirstOfComposite [$grammar $firstNonterminalOfComposite]
		$grammar ep: $compositeSymbol +:= [$grammar ep: $firstNonterminalOfComposite] 
		sym each altOfFirstOfComposite [sym concat $prodOfFirstOfComposite [sym rest $a]] {
			set task [list $compositeSymbol $altOfFirstOfComposite]
			vprioq enqueue work -id $task 0 $task
		}
		set alt [sym concat $terminalPrefix $compositeSymbol]
	}
	set rlN [sym union $rlN $pn]
	vassoc update rl $pn alts {set alts [sym union $alts $alt]}
}
foreach {pn alts} $rl {
	$grammar $pn := $alts
}
if {$rightLinearChanges} {
	report I "Right linear grammar"
	sym each pn [${grammar}::nonterminals] {
		report I "  [string map {#2c , #2e . #7c |} $pn]"
		sym each alt [$grammar $pn] {
			report I "    [string map {#2c , #2e . #7c |} $alt]"
		}
	}
}
unset rlN
unset rl

		
 
section    top

176. Right factor the grammar :: Convert each production to the forms

Aaσ
or AaσB
So they are truly right linear. The implicit lookahead in a string of terminals is now made explicit by wrapping terminals in an <is,terminal,lookahead-count>.

Productions with common tails are somewhat factorred into the same new production. This factorring is not necessary for a correct construction of the DFA, but it is useful for an efficient construction. The grammar transformations will often make many productions such that can be described as a concatenation of various symbols with the same semantics with various right ends that do the choice between right recursion and continuation. This section will discover all those right ends that can be easily collected.

Differring semantic strings between the left terminal and right production will inhibit this factorring. These are nondeterministic choices, which will dealt with in the NFA to DFA conversion.


vprioq new work -id %start 0 [list %start 0]
$automata maxla: := 1
while {[vprioq dequeue work task est; set est]} {
	setp {v la} $task
	if {$la+1>[$automata maxla:]} {$automata maxla: := [expr {$la+1}]}
	unset -nocomplain RF
	set RP {}
	sym each alt [$grammar $v] {
		set rf {}
		set sems {}
		set i 0
		set la1 $la
		if {![sym contains [${grammar}::nonterminals] [sym last $alt]]} {
			set alt [sym concat $alt [sym nest accept,0] %final]
		}
		sym enum sym $alt {
			if {![sym contains [${grammar}::nonterminals] $sym] && ![$grammar empty: $sym]} {
				if {[llength $rf]} {
					set tail [sym range $alt $i end]
					set rp rp%[sym nest $tail]
					if {[${grammar}::if complex: $rp]} {
						set rp [$grammar complex: $rp]
					} else {
						report I "rp%$sn = [string map {#2c , #2e . #7c |} $rp]"
						$grammar complex: $rp := rp%$sn
						set rp rp%$sn; incr sn
						$grammar ep: $rp := [$grammar ep: $v]
						$grammar $rp := $tail
						$grammar weight: $rp := 1
						$grammar nullable: $rp := 0
						vprioq enqueue work -id $rp 0 [list $rp $la]
					}
					report I "$v: ...$rp"
					lappend rf $rp
					break
				} else {
					set rf [concat [sym nest is,$sym,$la] $sems]
					incr la 1
				}
			} else {
				if {[sym contains [${grammar}::nonterminals] $sym]} {
					vprioq enqueue work -id $sym 0 [list $sym 0]
				}
				if {[string match <shift#*> $sym]} {incr la -1}
				if {[llength $rf]} {
					lappend rf $sym
				} else {
					lappend sems $sym
				}
			}
			incr i
		}
		lappend RF([concat $la [lrange $rf 0 end-1]]) [lindex $rf end]
	}
	set RP {}
	foreach {lp rps} [array get RF] {
		set lp [setp la $lp]
		if {[llength $rps]==2} {
			set rp [lindex $rps 0]
		} else {
			set rp rf%[sym nest [sym encode $rps]]
			if {[${grammar}::if complex: $rp]} {
				set rp [$grammar complex: $rp]
			} else {
				report I "rf%$sn = [string map {#2c , #2e . #7c |} $rp]"
				$grammar complex: $rp := rf%$sn
				set rp rf%$sn; incr sn
				$grammar ep: $rp := [$grammar ep: $v]
				set rpa {}
				foreach w $rps {
					catch {
						set rpa [sym union $rpa [$grammar $w]]
					}
				}
				$grammar $rp := $rpa
				$grammar weight: $rp := 1
				$grammar nullable: $rp := 0
				vprioq enqueue work -id $rp 0 [list $rp $la]
			}
			report I "$v: ...$rp"
		}
		lappend lp $rp
		lappend RP $lp
	}
	$grammar $v := [sym encode $RP]
}
report I "Right factorred grammar"
sym each pn [${grammar}::nonterminals] {
	report I "  [string map {#2c , #2e . #7c |} $pn]"
	sym each alt [$grammar $pn] {
		report I "    [string map {#2c , #2e . #7c |} $alt]"
	}
}

		
 
section    top

$automata %final := {}
$automata maxla: := 0
vprioq new work -id %start 0 [list 0 %start]
$automata ep: %start +:= (scanner)
$automata ep: %final +:= (scanner)
while {[vprioq dequeue work task est; set est]} {
	setp {la0 prod} $task
	$automata ep: $prod := [$grammar ep: $prod]
	sym each alt [$grammar $prod] {
		set transition #NS
		set next 0
		set la $la0
		while {![sym empty $alt]} {
			set member [sym first $alt]
			if {[sym contains [${grammar}::nonterminals] $member]} {
				vprioq enqueue work -id $member 0 [list $la $member]
				set next 1
			} elseif {![string match <*> $member]} {
				set member [sym nest is,$member,$la]
				if {![sym empty $transition]} {
					set rp [sym nest $alt]
					if {[${grammar}::if complex: rp%$rp]} {
						set member [$grammar complex: rp%$rp]
					} else {
						$grammar complex: rp%$rp := rp%$sn
						set member rp%$sn
						incr sn
						$grammar $member := $alt
						$grammar ep: $member := [$grammar ep: $prod]
					}
					vprioq enqueue work -id $member 0 [list $la $member]
					set alt #NS
					set next 1
				}
				incr la
				if {[${automata}::if maxla: := 0]<$la} {
					$automata maxla: := $la
				}
			} elseif {[string match <shift#*> $member]} {
				incr la -1
			}
			set transition [sym concat $transition $member]
			set alt [sym rest $alt]
		}
		if {!$next} {set transition [sym concat $transition %final]}
		$automata $prod +:= $transition
	}
}
report I "NFA"
foreach state [lsort -dictionary [sym decode [${automata}::nonterminals]]] {
	report I "  [string map {#2c , #2e . #7c |} $state]"
	sym each transition [$automata $state] {
		report I "    [string map {#2c , #2e . #7c |} $transition]"
	}
}

		
 
section    top

178. Make the nondeterministic FA deterministic :: This is taken from Hopcroft and Ullman Introduction to Automata Theory, Languages, and Computation section 2.2. Given a transition to a nonsingleton set of states δ (s, a)={t1, t2, ⋯, tn} create a new state and single transition to that state δ (s, a)=[t1, t2, ⋯, tn] . The successors of the new state are all of the sucessors of the states t1 , t2, , etc. with the same symb edge labels. The common prefix of all the sem edge labels into t1 , t2, , etc. are retained on the edge into [t1, t2, ⋯, tn] , but distinct suffixes are prepended to edges out of the new state. The new state is then checked if it is deterministic.

This algorithm deals with a bare NFA and not a Mealy machine with semantics on the edges. When two transitions on the same symbol leave a state in the NFA with different semantics, the semantics must be deferred until which transition was supposed to be taken can be determined. If this deferment is too complicated, the machinery to handle it can be as bad as just running the NFA instead. In such cases the grammar is rejected and must be amended. With this in mind, the above algorithm is expanded:


set anydet 0
vprioq new work 0 [list %start [$automata %start] [$automata ep: %start]]
while {[vprioq dequeue work task est; set est]} {
	setp {s trs ep} $task
	if {[info exists dfa($s)]} {
		set dfa($s) [sym union $dfa($s) [sym nest $trs]]
	} else {
		$automata ep: $s +:= $ep
		set dfa($s) [sym nest $trs]
		set X {}
		sym each tr $trs {
			vassoc update X [sym first $tr] t {
				set t [sym union $t [sym rest $tr]]
			}
		}
		$automata $s := {}
		foreach {a trs} $X {
			if {[sym card $trs]==1} {
				$automata $s +:= [sym concat $a $trs]
				set t [sym last $trs]
				vprioq enqueue work [list $t [$automata $t] [$automata ep: $t]]
			} else {
				set anydet 1
				set commonsems [sym prefix $trs]
				set nl [sym length $commonsems]; set ns -1
				set tails {}
				set compound {}
				set es {}
				sym each tr $trs {
					set t [sym last $tr]
					set tr [sym range $tr $nl end-1]
					set ns1 0
					sym enum member $tr {
						if {[string match <shift#*> $member]} {
							incr ns1 1
						} elseif {![string match <*> $member] && ![sym contains [${automata}::nonterminals] $member]} {
							incr ns1 -1
						}
					}
					if {$ns<0} {
						set ns $ns1
						for {} {$ns1>0} {incr ns1 -1} {
							set commonsems [sym concat $commonsems <buffer>]
						}
					} elseif {$ns!=$ns1} {
						report E nondeterministic grammar: conflicting number of shifts: [$automata ep: $s]
						$grammar compiled: := 0
					}
					sym each othertr [$automata $t] {
						set othertr [sym concat [sym first $othertr] $tr [sym rest $othertr]]
						set tails [sym union $tails $othertr]
					}
					set compound [sym union $compound $t]
					set es [sym union $es [$automata ep: $t]]
				}
				set compound df%[sym nest $compound]
				$automata $s +:= [sym concat $a $commonsems $compound]
				vprioq enqueue work [list $compound $tails $es]
			}
		}
	}
}
sym each s [sym diff [${automata}::nonterminals] [sym encode [array names dfa]]] {
	${automata}::delete $s
}
set register 0
foreach {s trsset} [array get dfa] {
	if {[sym card $trsset]<=1} continue
	set trsset [sym unnest trsset]
	set trs [sym choose $trsset]
	set A {}
	set nondeterministic 0
	sym each tr $trs {set A [sym union $A [sym first $tr]]}
	if {[sym card $trsset]==2} {
		sym each a $A {set occursCount($a) 0; set follower($a) {}}
		sym each tr $trs {
			set a [sym first $tr]
			incr occursCount($a)
			set follower($a) [sym union $follower($a) [sym last $tr]]
			set join($a) {}
		}
		set looping {}
		sym each a $A {
			if {$occursCount($a)>1} {set looping [sym union $looping $a]}
		}
		if {[sym card $looping]==0} {
			error {nonconvergent but deterministic NFA loop}
		} elseif {[sym card $looping]>1} {
			set nondeterministic 1
		}
		sym each trs $trsset while {!$nondeterministic} {
			sym each a $A {set occursCount1($a) 0; set follower1($a) {}}
			sym each tr $trs {
				set a [sym first $tr]
				incr occursCount1($a)
				set follower1($a) [sym union $follower1($a) [sym last $tr]]
				set join($a) [sym union $join($a) [sym range $tr 1 end-1]
			}
			sym each a $A {
				if {$occursCount($a)!=$occursCount1($a)} {
					set nondeterministic 1
				}
				if {![sym eq $follower($a) $follower1($a)]} {
					set nondeterministic 1
				}
			}
			unset -nocomplain occursCount1
			unset -nocomplain follower1
		}
		unset -nocomplain occursCount
	} else {
		set nondeterministic 1
	}
	if {$nondeterministic} {
		set AE {}
		sym each a $A {set AE [sym union $AE [sym index [sym unnest $a] 1]]}
		report E nondeterministic grammar: shifting $AE: [$automata ep: $s]
		$grammar compiled: := 0
		continue
	}
	$automata $s := {}
	$automata incr%$s := {}
	$automata ep: incr%$s := [$automata ep: $s]
	sym each a $A {
		set suffix [sym suffix $join($a)]
		set join1 {}
		sym each sem $join($a) {set join1 [sym union $join1 [sym range $sem 0 end-[sym length $suffix]]]}
		set prefix [sym prefix $join1]
		set middle #NS
		sym each sem $join($a) while {[sym empty $middle]} {
			set middle [sym range $a [sym length $prefix] end-[sym length $suffix]]
		}
		if {[sym eq $a $looping]} {
			$automata $s +:= [sym concat $a [sym nest init,$register,0] incr%$s]
			$automata incr%$s +:= [sym concat $a [sym nest incr,$register,0] incr%$s]
		} else {
			set decr decr%[sym nest $s,$a]
			$automata $s +:= [sym concat $a $prefix $suffix $follower($a)]
			$automata incr%$s +:= [sym concat $a [sym nest incr,$register,0] $prefix $decr]
			$automata $decr := [sym union [
				sym concat [sym nest po,$register] [sym nest decr,$register] $middle $decr
			] [
				sym concat [sym nest zn,$register] $suffix $follower($a)
			]]
			$automata ep: $decr := [$automata ep: $s]
		}
	}
	incr register
	unset -nocomplain join
	unset -nocomplain follower
}
if {$anydet} {
	report I "DFA"
	foreach state [lsort -dictionary [sym decode [${automata}::nonterminals]]] {
		report I "  [string map {#2c , #2e . #7c |} $state]"
		sym each transition [$automata $state] {
			report I "    [string map {#2c , #2e . #7c |} $transition]"
		}
	}
}
unset -nocomplain dfa

		
 
section    top

if {!$anypush} {
	set maxps 0
	set ps(%start) 0
	vprioq new work -id  %start 0 %start
	$automata maxps: := 0
	while {$maxps>=0 && [vprioq dequeue work s est; set est]} {
		sym each alt [$automata $s] while {$maxps>=0} {
			set t [sym last $alt]
			set ps1 $ps($s)
			sym enum sem [sym front $alt] {
				setp {action x n} [sym decode string [sym unnest $sem]]
				switch $action {
					shift {
						incr ps1
						if {$ps1>$maxps} {set maxps $ps1}
					}
					reduce {
						incr ps1 -$n
						incr ps1 1
						if {$ps1>$maxps} {set maxps $ps1}
					}
					accept {
						if {$ps1!=1} {
							report I accept PS size not one: $s
							set maxps -1
						}
					}
				}
				if {[info exists ps($t)]} {
					if {$ps1!=$ps($t)} {
						report I differring PS sizes: $s
						set maxps -1
					}
				} else {
					set ps($t) $ps1
				}
				vprioq enqueue work -id $t 0 $t
			}
		}
	}
	if {$maxps>=0} {$automata maxps: := $maxps}
}

		
   
   

LR(k) Parser Generation

   
top

<Normal items>
<Normal items closure>
<Normal state goto>
<Transition from normal to lookahead state>
<Lookahead items>
<Lookahead items closure>
<Lookahead state goto>
	
proc createLR(k) {grammar automata} {
	log Begin createLR(k).
	<LR(k) properties>
	set k [$grammar k:]
	set maxk 0
	$automata ep: 0 := %start|[sym first [$grammar %start]]
	<Number the productions>
	log Derive LR(k) state machine.
	<Derive the LR(k) state machine>
	$grammar k: := $k
	log Report the LR(k) state machine.
	<Report the LR(k) state machine>
	if {$islrk} {
		log Make a graph of LR(k) state machine.
		<Make a graph of the LR(k) state machine>
		array set R {}
		<Debugging: trace changes to the L and R arrays>
		log Identify cycle entry points.
		<Identify cycle entry points>
		log Split edges into CEs.
		<Split edges into CEs>
		log Find reduction back tracks.
		<Find reduction back tracks>
		log Link reductions to next state.
		<Link reductions to next state>
		log Clean up reduction matching.
		<Clean up reduction matching>
		log Create the EFA from the state graph.
		<Create the EFA from the state graph>
		log Collapse e-moves in EFA.
		<Collapse e-moves in EFA>
		log Compile the LR(k) parser.
		$grammar compiled: := 1
		$automata name: := [$grammar name:]
		$automata start: := 0
		$automata domains: := [sym union 0 $CE ]
		$automata maxla: := [expr {$maxk+1}]
		$automata symbols: := [${grammar}::terminals]
		compileAutomata $grammar $automata
	} else {
		$grammar compiled: := 0
	}
	log End createLR(k).
}

		
 
section    top

set pn 1
sym each p [${grammar}::nonterminals] {
	sym each alt [$grammar $p] {
		if {[string equal $p %start]} {
			set pn1 0
		} else {
			set pn1 $pn; incr pn
		}
		set production($pn1.sym) $p
		set production($pn1.len) [sym length $alt]
		set production($pn1.alt) $alt
		lappend production($p) $pn1
	}
}

		
 
section    top

proc encodeNormalItem {pn beta gamma} {
	upvar 1 production production
	sym concat $pn [expr {$production($pn.len)-[sym length $beta]}] $gamma
}

proc decodeNormalItems {items} {
	upvar 1 production production
	set S {}
	sym each item $items {
		set pn [sym index $item 0]
		set nalpha [sym index $item 1]
		set alpha [sym range $production($pn.alt) 0 [expr {$nalpha-1}]]
		set beta [sym range $production($pn.alt) $nalpha end]
		set gamma [sym range $item 2 end]
		lappend S [list $pn $alpha $beta $gamma]
	}
	return $S
}

proc formatNormal {pn alpha beta gamma} {
	upvar 1 production production
	string map {"#NS" {}} "\[$pn $production($pn.sym): $alpha . $beta; $gamma\]"
}

		
 
section    top

proc NCLOSURE {sn state} {
	upvar 1 N N
	upvar 1 production production
	upvar 1 grammar grammar
	set reduced -1
	set adequate 1
	set shifts 0
	log "  NCLOSURE $sn"
	vprioq new work
	foreach item [decodeNormalItems $state] {
		vprioq enqueue work -id $item 0 $item
		setp {pn alpha beta gamma} $item
		log "    [formatNormal $pn $alpha $beta $gamma]"
	}
	set closure {}
	while {[vprioq dequeue work item est; set est]} {
		setp {pn alpha beta gamma} $item
		if {[sym length $beta]==0} {
			if {$reduced<0 && $shifts==0} {
				set reduced $pn
			} elseif {$reduced!=$pn} {
				set adequate 0
			} else {
				set reduced $pn
			}
		} else {
			if {$reduced>=0} {set adequate 0}
			incr shifts
		}
		set closure [sym union $closure [encodeNormalItem $pn $beta $gamma]]
		if {![sym empty $beta] && [sym contains [${grammar}::nonterminals] [sym first $beta]]} {
			set delta [$grammar DUP: [sym rest $beta] $gamma]
			foreach pn $production([sym first $beta]) {
				sym each d $delta {
					set item [list $pn "#NS" $production($pn.alt) $d]
					vprioq enqueue work -id $item 0 $item
				}
			}
		}
	}
	log "    adequate=$adequate"
	foreach item [decodeNormalItems $closure] {
		setp {pn alpha beta gamma} $item
		log "      [formatNormal $pn $alpha $beta $gamma]"
	}
	list $adequate $closure
}

		
 
section    top

proc NGOTO {sn state} {
	upvar 1 N N
	upvar 1 production production
	upvar 1 automata automata
	log "  NGOTO $sn"
	foreach item [decodeNormalItems $state] {
		setp {pn alpha beta gamma} $item
		if {![sym empty $alpha]} {
			$automata ep: $sn +:= $production($pn.sym)
		}
		if {[sym length $beta]} {
			if {![info exists S([sym first $beta])]} {set S([sym first $beta]) {}}
			set S([sym first $beta]) [sym union $S([sym first $beta]) [encodeNormalItem $pn [sym rest $beta] $gamma]]
		} else {
			set S(<reduce>) $pn
		}
	}
	foreach first [lsort [array names S]] {
		if {[string equal $first <reduce>]} {
			log "    reduce $production($S($first).sym): $production($S($first).alt)"
		} else {
			log "    goto $first"
			foreach item [decodeNormalItems $S($first)] {
				setp {pn alpha beta gamma} $item
				log "      [formatNormal $pn $alpha $beta $gamma]"
			}
		}
	}
	array get S
}

		
 
section    top

proc TRANSITION {sn state} {
	upvar 1 N N
	upvar 1 production production
	upvar 1 grammar grammar
	set S {}
	log "  TRANSITION $sn"
	foreach item [decodeNormalItems $state] {
		setp {pn alpha beta delta} $item
		log "    [formatNormal $pn $alpha $beta $delta]"
		if {[sym empty $beta]} {
			set act $pn
			set xi $delta
		} else {
			set act shift
			set xi [$grammar DUPEF: $beta $delta]
		}
		sym each x $xi {
			set S [sym union $S [encodeLookaheadItem $pn $alpha "#NS" $x $act]]
		}
	}
	foreach item [decodeLookaheadItems $S] {
		setp {pn alpha u gamma act} $item
		log "      [formatLookahead $pn $alpha $u $gamma $act]"
	}
	LCLOSURE $sn $S
}

		
 
section    top

proc encodeLookaheadItem {pn alpha u delta act} {
	sym concat $act $pn [sym length $alpha] [sym length $u] $u $delta
}

proc decodeLookaheadItems {items} {
	upvar 1 production production
	set S {}
	sym each item $items {
		set act [sym index $item 0]
		set pn [sym index $item 1]
		set nalpha [sym index $item 2]
		set nu [sym index $item 3]
		set alpha [sym range $production($pn.alt) 0 [expr {$nalpha-1}]]
		set u [sym range $item 4 [expr {$nu+3}]]
		set delta [sym range $item [expr {$nu+4}] end]
		lappend S [list $pn $alpha $u $delta $act]
	}
	return $S
}

proc formatLookahead {pn alpha u delta act} {
	upvar 1 production production
	set result "$pn $production($pn.sym): $alpha . $u . $delta; "
	if {[string equal $act shift]} {
		append result "shift"
	} else {
		append result "reduce $pn $production($pn.sym): $production($pn.alt)"
	}
	string map {"#NS" {}} \[$result\]
}

		
 
section    top

conflicts-property
	{(conflicts),<propertyrule>}
	{(conflictspropertyfirst),<notion>}
conflicts-property
	{((proto|meta|hyper)),((?~N=.*)),<conflictspropertyfirst>:}
	{($N),($N),(conflictsproperty),<notion>}
conflicts-set
	{((?~S=.*)),((proto|meta|hyper)),((?~N=.*)),<conflictsproperty>}
	{([sym union $S $N]),<conflictspropertynext>}
conflicts-property-next
	{<conflictspropertynext>[,;|]}
	{(conflictsproperty),<notion>}
finished-conflicts
	{((?~A=.*)),((?~N=.*)),<startpropertynext>}
	{(conflicts property),<[
		set C [hr conflicts:]
		lappend C $A $N
		hr conflicts: := $C
		concat endrule
	]>}
conflicts-error
	{<conflictspropertyfirst>}
	{(conflicts property),<[
		error E \[$line\] missing conflict resolver
		concat endrule
	]>}

		
 
section    top

proc LCLOSURE {sn state} {
	upvar 1 production production
	upvar 1 grammar grammar
	vprioq new work
	set adequate 1
	set complete 0
	set k [$grammar k:]
	log "  LCLOSURE $sn"
	foreach item [decodeLookaheadItems $state] {
		vprioq enqueue work -id $item 0 $item
		setp {pn alpha u gamma act} $item
		log "    [formatLookahead $pn $alpha $u $gamma $act]"
	}
	set closure {}
	while {[vprioq dequeue work item est; set est]} {
		setp {pn alpha u gamma act} $item
		if {![info exists seen($u)]} {
			set seen($u) $act
		} elseif {![string equal $seen($u) $act]} {
			set adequate 0
		}
		set closure [sym union $closure [encodeLookaheadItem $pn $alpha $u $gamma $act]]
		set i [sym length $u]
		if {$i<$k} {
			if {![sym empty $gamma] && [sym contains [${grammar}::nonterminals] [sym first $gamma]]} {
				foreach B $production([sym first $gamma]) {
					if {$production($B.len)==0} continue
					sym each beta [$grammar DUP: $production($B.alt) [sym rest $gamma] [expr {$k-$i}]] {
						if {[$grammar empty: $beta]} continue
						set item [list $pn $alpha $u $beta $act]
						vprioq enqueue work -id $item 0 $item
					}
				}
			}
		} else {
			set complete 1
		}
	}
	if {$complete && !$adequate} {
		set conflicts {}
		foreach item [decodeLookaheadItems $closure] {
			setp {pn alpha u delta act} $item
			if {[string equal $act shift]} {
				set conflicts [sym union $conflicts [sym first $u]]
			} else {
				set conflicts [sym union $conflicts $production($pn.sym)]
			}
		}
		foreach {action syms} [hr conflicts:] {
			if {[sym contains $syms $conflicts]} {
				set closure1 {}
				foreach item [decodeLookaheadItems $closure] {
					setp {pn alpha u gamma act} $item
					if {[string match *symbol $action] && [string equal $act shift]} {
						set closure1 [sym union $closure1 [encodeLookaheadItem $pn $alpha $u $gamma $act]]
					} elseif {[string equal $production($pn.sym) $action]} {
						set closure1 [encodeLookaheadItem $pn $alpha $u $gamma $act]
						break
					}
				}
				set closure $closure1
				set adequate 1
				break
			}
		}
	}
	log "    adequate=$adequate complete=$complete"
	foreach item [decodeLookaheadItems $closure] {
		setp {pn alpha u gamma act} $item
		log "      [formatLookahead $pn $alpha $u $gamma $act]"
	}
	list $adequate $complete $closure
}

		
 
section    top

proc LGOTO {sn state} {
	upvar 1 grammar grammar
	upvar 1 production production
	upvar 1 automata automata
	log "  LGOTO $sn"
	foreach item [decodeLookaheadItems $state] {
		setp {pn alpha u delta act} $item
		$automata ep: $sn +:= $production($pn.sym)
		if {[sym empty $delta]} {
			set a %end
		} else {
			set a [sym first $delta]
		}
		if {![sym contains [${grammar}::nonterminals] $a]} {
			if {![info exists S($a)]} {set S($a) {}}
			set S($a) [sym union $S($a) [encodeLookaheadItem $pn $alpha [sym concat $u $a] [sym rest $delta] $act]]
		}
	}
	foreach first [lsort [array names S]] {
		log "    goto $first"
		foreach item [decodeLookaheadItems $S($first)] {
			setp {pn alpha u gamma act} $item
			log "      [formatLookahead $pn $alpha $u $gamma $act]"
		}
	}
	array get S
}

		
 
section    top

set sn 0
set islrk 1
setp {adequate startstate} [
	NCLOSURE $sn [
		encodeNormalItem 0 $production(0.alt) "#NS"
	]
]
vprioq new states -id [list $startstate -] 0 [list N$adequate $startstate]
while {[vprioq dequeue states task est; set est]} {
	setp {gender state origin} $task
	set STATE($state) $sn
	$grammar complex: $sn := $task
	$automata ep: $sn := {}
	switch $gender {
		N1 {
			foreach {X next} [NGOTO $sn $state] {
				if {[string equal $X <reduce>]} {
					set c($sn) $next
				} else {
					setp {adequate next} [NCLOSURE $sn $next]
					vprioq enqueue states -id [list $next -] 0 [list N$adequate $next]
					set c($sn) shift
					set eg($sn.$X) $next
				}
			}
		}
		N0 {
			setp {adequate complete lookahead} [TRANSITION $sn $state]
			set c($sn) look,0
			foreach {X next} [NGOTO $sn $state] {
				if {![string equal $X <reduce>] && [sym contains [${grammar}::nonterminals] $X]} {
					setp {adequate next} [NCLOSURE $sn $next]
					vprioq enqueue states -id [list $next -] 0 [list N$adequate $next]
					set eg($sn.$X) $next
				}
			}
			foreach {a next} [LGOTO $sn $lookahead] {
				setp {adequate complete next} [LCLOSURE $sn $next]
				vprioq enqueue states -id [list $next $origin] 0 [list L$adequate$complete $next $state]
				set eg($sn.$a) $next
			}
		}
		L00 {
			setp {pn alpha u delta act} [lindex [decodeLookaheadItems $state] 0]
			set off [sym length $u]
			if {$off>$maxk} {set maxk $off}
			incr off -1
			set c($sn) look,$off
			foreach {a next} [LGOTO $sn $state] {
				setp {adequate complete next} [LCLOSURE $sn $next]
				vprioq enqueue states -id [list $next $origin] 0 [list L$adequate$complete $next $origin]
				set eg($sn.$a) $next
			}
		}

		L01 {
			set c($sn) error
			set islrk 0
			set em "not LR($k): State $sn"
			foreach item [decodeLookaheadItems $state] {
				setp {pn alpha u delta act} $item
				append em "; $production($pn.sym): $alpha . [
						sym range $production($pn.sym) [sym length $alpha] end
				]"
			}
			report E [string map {#NS {}} $em]
		}
		L10 - L11 {
			setp {pn alpha u delta act} [lindex  [decodeLookaheadItems $state] 0]
			set c($sn) $act
			if {[string equal $act shift]} {
				foreach {a next} [NGOTO $sn $origin] {
					if {![string equal $a <reduce>] && ![sym contains [${grammar}::nonterminals] $a]} {
						setp {adequate next} [NCLOSURE $sn $next]
						vprioq enqueue states -id [list $next -] 0 [list N$adequate $next]
						set eg($sn.$a) $next
					}
				}
			}
		}
	}
	incr sn
}
foreach {x y} [array get eg] {set eg($x) $STATE($y)}
foreach {x y} [${grammar}::filter complex: *] {
	set x [lindex $x 1]
	setp {gender state origin} $y
	set gen($x) [string index $gender 0]
	if {[info exists STATE($origin)]} {
		$grammar complex: $x := [list $gender $state $STATE($origin)]
	}
}
unset -nocomplain STATE

		
 
section    top

report I "LR($k) State Machine"
for {set s 0} {$s<$sn} {incr s} {
	setp {gender state origin} [$grammar complex: $s]
	report I "  $gen($s)$s"
	if {[info exists production($c($s).sym)]} {
		report I "    reduce $production($c($s).sym): $production($c($s).alt)"
	} else {
		report I "    $c($s)"
	}
	switch $gen($s) {
		N {
			foreach item [decodeNormalItems $state] {
				setp {pn alpha beta gamma} $item
				report I "      [formatNormal $pn $alpha $beta $gamma]"
			}
		}
		L {
			foreach item [decodeLookaheadItems $state] {
				setp {pn alpha u delta act} $item
				report I "      [formatLookahead $pn $alpha $u $delta $act]"
			}
		}
	}
	foreach x [lsort [array names eg $s.*]] {
		report I "    [lindex [split $x .] 1]: $gen($eg($x))$eg($x)"
	}
}

		
 
section    top

192. LR(k) properties :: This is based on DUP function of the text, but coded to improve performance. Because the actual uses of DUP are with formula FW(DUP(x)y), the concatenation of the suffix y and the truncation to the maximum weight are handled within this code, rather than as a separate (and more expensive) sym concat and FW call.

For each string in the original set string, traverse it, and expand into a set of strings s as nullable symbols are reached; forking with the nullable symbol included and excluded. The current weight of each constructed string is kept in w; this stops appending to the string once it reaches the maximum size. A is the list of active strings.

Once all the new strings are discoverred, the suffix is added to anything not yet at the maximum weight.

FW is collects symbols of each string in a set up to the maximum weight.


${grammar}::property DUP: {grammar set suffix {k {}}} {
	if {![string length $k]} {set k [$grammar k:]}
	set a 0
	sym each string $set {
		set s($a) "#NS"
		set w($a) 0
		set A $a
		incr a
		sym enum sym $string while {[llength $A]} {
			set wsym [$grammar weight: [sym first $sym]]
			if {$wsym==0} continue
			set A1 {}
			if {[$grammar nullable: [sym first $sym]]} {
				foreach i $A {
					set s($a) $s($i)
					set w($a) $w($i)
					lappend A1 $a
					incr a
				}
			}
			foreach i $A {
				set s($i) [sym concat $s($i) $sym]
				incr w($i) $wsym
				if {$w($i)<$k} {lappend A1 $i}
			}
			set A $A1
		}
	}
	set S {}
	foreach {i syms} [array get s] {
		sym enum sym $suffix {
			if {$w($i)>=$k} break
			set syms [sym concat $syms $sym]
			incr w($i) [$grammar weight: [sym first $sym]]
		}
		set S [sym union $S $syms]
	}
	return $S
}

		
 
section    top

193. LR(k) properties :: This is based on DUPEF function of the text. To keep the interface similar to DUP, it also accepts a suffix and truncates the returned set, but in a nonoptimised fashion.

The implementation is taken mostly from the text, but with includes the code to actually get the first terminal from a nonterminal. This extraction tends to be glossed over as magic in explanations of LR(k). What happens here is that it iterates over a set of strings, their lengths bounded with FW, until no new elements are added: if a string beginning with a nonterminal is in the set, strings with that replaced with each alternative truncated by FW are added to the set. If the production is left recursive acyclic AAα, ⊤α|&gt;0 , that would add strings FW (Aα) , FW (Aαα) , FW (Aααα) , ..., FW (Aαm) . Since alpha is nonempty, this will reach the weight limitation until FW (Aβ)=FW (Aβαn) .


${grammar}::property FW: {grammar set {k {}}} {
	set S {}
	if {![string length $k]} {set k [$grammar k:]}
	sym each alt $set {
		set w 0; set s "#NS"
		sym enum sym $alt {
			incr w [$grammar weight: $sym]
			set s [sym concat $s $sym]
			if {$w>=$k} break
		}
		set S [sym union $S $s]
	}
	return $S
}

${grammar}::property DUPEF: {grammar set suffix {k {}}} {
	if {![string length $k]} {set k [$grammar k:]}
	vprioq new V
	sym each string $set {
		set string [$grammar FW: $string $k]
		vprioq enqueue V -id $string 0 $string
	}
	set W {}
	while {[vprioq dequeue V string est; set est]} {
		if {[sym contains [${grammar}::nonterminals] [sym first $string]]} {
			sym each alt [$grammar [sym first $string]] {
				if {![$grammar empty: $alt]} {
					sym each eff [
							$grammar FW: [
								sym concat [
									sym first $alt
								] [
									$grammar DUP: [sym rest $alt] [sym rest $string] $k
								]
							] $k
					] {
						vprioq enqueue V -id $eff 0 $eff
					}
				}
			}
		} else {
			set W [sym union $W $string]
		}
	}
	set S [$grammar FW: [list [sym concat $W $suffix]] $k]
	return $S
}

		
 
section    top

194. Make a graph of the LR(k) state machine :: The idea of compiling LR(k) tables into procedures is originally from Pennello Very Fast LR Parsing Sigplan Notices July 1986, followed by Roberts Recursive Ascent: An LR Analog to Recursive Descent Sigplan Notices August 1988. This code goes beyond these in attempting to avoid procedure calls if possible, and use in-procedure gotos and loops as far as possible.

Each constructed unique NCLOSURE of normal items and each unique LCLOSURE of lookahead items constructed from LGOTO (but not the LCLOSURE of the TRANSITION) is an LR(k) state. The sets of items are mapped into nonnegative integers; the initial state is always numberred 0. Once the states are all identified and the parsing action c (s) and extended goto eg (s, X) functions are constructed, the LR(k) tables are written as a graph G=(V, E, L) with the sym graph subcommands. Because all edges into one specific state are gotos on the same symbol, the graph does not need multiple edges from two states with different symbols. The vertices V is the set of states together with a distinct final state . The edges E={(s, t) : (∃X) (eg (s, X)=t)} together with one edge from the accepting state to . The edges and nodes will be labelled with strings L (v) or L (u, v) . Labels are indicated as v=[L (v)] and (u, v)=[L (u, v)] .

The symbols used in labels are grammar terminal and nonterminals, and a variety of semantic symbols. The purpose of most of the semantic symbols will be explained later. The first two data structures used in the compiled parser are the input queue IQ and parse stack PS. IQ holds all the input symbols from the input; PS is a stack of symbols and parse trees consisting of children which are symbols or parse trees labelled with a production number. At most the first k+1 symbols of IQ are examined before the first symbol is shifted. The variable q is the index of which IQ element is examined; it is initialised to 0.

Semantic symbols.
Terminal a .
This edge is followed if IQ[q]=a.
Nonterminal A .
LR(k) shift of a nonterminal after a reduction to A.
Shift σa or <shift,nested-action>.
Shift the first symbol of IQ and push it on PS. Set q to zero.
Look ahead λ or <look,i>.
Increment q by one.
Accept ωv or <accept,v>.
Accept the input.
Reduction ρA or <reduce,nested-action,n>.
The production number identifies AX1X2Xn which in turn includes the action action: A X1X2Xn . This production has been identified. Abstractly the top n elements of PS are popped into a new tree labelled with production-name and pushed back on PS. The parse will back up n states and then go to the state with the edge labelled A. However the reduction action will implement this actual manipulations. Set q to zero.
Identify reduction ψAn or <reduction,symbol,n<.
Identifies the production being reduced to as the first n symbols of A.
Initialise ιsv or <init,s,v>.
Initialise the counter s to v.
Increment s or <incr,s>.
Increment the counter s.
Decrement s or <decr,s>.
Decrement the counter s.
Nonnegative jump κs+ or <po,s>.
This edge is followed if counter s is positive.
Negative jump κs or <zn,s>.
This edge is followed if counter s is nonpositive.
Counter switch κsv or <eq,s,v>.
This edge is followed if counter s=v.
Call pise or <call,s,e>.
Call state e, setting EN to s.
Entry switch tau^e or <en,e>.
True if EN equals e.
Match reduction μAn or μv or <as,symbol,n> (<as,v>).
The edge is followed if the production being reduced is the first n symbols of A.

The graph is constructed

If c (s)= reduce AX1X2Xn unless this is production 0, add the node s=[υAψAn]
Otherwise add the node s=[ε] .
If c (s)= reduce 0, add the edge (s, ⊥)=[υSω0]

For each t=eg (s, X) add the edge

(s, t)=[Xσ] if X is a terminal and c (s)= shift
(s, t)=[Xλ] if X is a terminal and c (s)= look,i
otherwise (s, t)=[X] if X is a nonterminal.


vprioq new backtracking
set G (-1)
for {set s 0} {$s<$sn} {incr s} {
	set G [sym union $G $s]
	if {[info exists production($c($s).sym)]} {
		set pn $c($s)
		set A $production($pn.sym)
		set n $production($pn.len)
		set action [${grammar}::if action: $A $production($pn.alt) := "#NS"]
		if {$pn==0} {
			set G [sym union $G $s,-1]
			if {[sym empty $action]} {
				set L($s,-1) [sym nest accept,0]
			} else {
				set L($s,-1) [list [sym nest reduce,$action,$n] [sym nest accept,0]]
			}
		} else {
			if {[sym empty $action]} {
				set L($s) [sym nest reduction,$A,$n]
			} else {
				set L($s) [list [sym nest reduce,$action,$n] [sym nest reduction,$A,$n]]
			}
			vprioq enqueue backtracking [list $A $n 0 -1 $s]
		}
	} else {
		set L($s) {}
	}
	foreach {X t} [array get eg $s.*] {
		set X [lindex [split $X .] 1]
		set G [sym union $G $s,$t]
		if {[sym contains [${grammar}::nonterminals] $X]} {
			set L($s,$t) $X
		} elseif {[string match look* $c($s)]} {
			set L($s,$t) [list $X [sym nest $c($s)]]
		} elseif  {[${grammar}::if action: $X %shift]} {
			set L($s,$t) [list $X [sym nest shift,[$grammar action: $X %shift]]]
		} else {
			set L($s,$t) $X
		}
	}
}

		
 
section    top

195 :: In the following note that any grammar with empty productions and indirect recursion can be converted into a grammar for the same language without these. It simplifies the discussion to ignore this without significantly restricting the languages considerred.

(1) Lookahead invariance in a loop.
If a state with an item [A→α .β, δ] is in a cycle, this state has a successor [A→α .β, ηδ] , [A→α .β, ηηδ] , ..., [A→α .β, ηmδ] . For it to be a cycle, the states and items must be identical, hence FW (ηmδ)=FW (ηm)=FW (ηmζ) , for sufficiently large m .
(2) Production recursion and state cycles.
(2A) Cycles in the state graph occur for nonleft recursion. For left recursion, the first item for AAβ is [A→.Aβ, δ] , which shifts to [AA .β, δ] where β*ηAζ so this cannot lead to another state with the item [A→.Aβ, δ] and there is no cycle. A nonrecursive production does not occur in even the first item, and so there is nothing to cycle back to.
(2B) For nonleft recursion A→αAβ, ⊤α|&gt;0 , the first state with the production will have [A→.αAβ, δ] and it will take ⊤α| shifts into a different state with the item [A→α .Aβ, δ] . Its closure will include [A→.αAβ, FW (βδ)]
+[A→α .Aβ, FW (β2δ)]
*[A→α .Aβ, FW (βmδ)]
=[A→α .Aβ, FW (βm+nδ)]
=[A→α .Aβ, η] for sufficiently large m
At this point the state is its own successor and cycles.
(3) A production does not span multiple cycles.
If a state has [A→.χ, δ] , no path to a state with [A→χ ., η] will enter two distinct cyclic SCCs.
Cycles only occur from nonleft recursion (2B), so such a production would have to be A→αAβAγ . The first item would be [A→.αAβAγ, δ] . The first cycle would then include the state with
[A→α .AβAγ, η]
[A→.αAβAγ, FW (βη)]
with successor state includes
[A→.αAβAγ, FW (βmη)]
so that η=FW (βmη)=FW (βm)
The second cycle would then include the state with
[A→αAβ .Aγ, θ]
[A→.αAβAγ, FW (γθ)]
thence
[A→α .AβAγ, FW (γθ)]
[A→.αAβAγ, FW (βγθ)]
thence
[A→.αAβAγ, FW (βmγθ)]
so that θ=FW (βmγθ)=FW (βm)=η . This is a state in the first cycle, so the two cycles are in the same SCC.
(4) A production does not enter different cycles.
If a state has [A→.χ, δ] , if any path to [A→χ ., η] enters a cyclic SCC, all other paths that enter a cyclic SCC enter the same SCC.
The production would require two different productions to create two different cycles A→αAβ and A→γAδ . However the closure of [A→α .Aβ, η] in one cycle would introduce the other production [A→.γAδ, βη] , and the same state would end up in both cycles, so the cycles will coincide and are in the same SCC.
 
section    top
protocol
G—Parser graph.
L(v)—Vertex label.
L(x,y)—Edge label.
CE—All complex entry points.
SE(t)—s is a simple entry and SE(s)=u is its cycle name.
C(u)—All nodes within the simple cycle u.

array set SE {}
set CE {}
array set C {}
set H $G
while 1 {
	set nscc [sym scc $H]
	set cut {}
	for {set scc 1} {$scc<=$nscc} {incr scc} {
		if {[sym card [sym memb $H $scc]] > 1} {
			set simple 1
			sym each s [sym memb $H $scc] while {$simple} {
				set outcount 0
				sym each t [sym from $H $s] while {$outcount<=1} {
					if {[sym scc $H $t]==$scc} {
						incr outcount
					}
				}
				if {$outcount>1} {set simple 0}
			}
			if {$simple} {
				set u [sym choose [sym memb $H $scc]]
				set C($u) [sym memb $H $scc]
				sym each s [sym memb $H $scc] {
					set ep 0
					sym each t [sym to $H $s] while {!$ep} {
						if {[sym scc $H $t]!=$scc} {
							set ep 1
						}
					}
					if {$ep} {
						set SE($s) $u
					}
				}
			} else {
				sym each s [sym memb $H $scc] {
					set ep 0
					sym each t [sym to $H $s] while {!$ep} {
						if {[sym scc $H $t]!=$scc} {
							set ep 1
						}
					}
					if {$ep} {
						set CE [sym union $CE $s]
						sym each t [sym to $H $s] {
							if {[sym scc $H $t]==$scc} {
								set cut [sym union $cut $t,$s]
							}
						}
					}
				}
			}
		} else {
			set s [sym choose [sym memb $H $scc]]
			if {[sym card [sym edge $H $s,$s]] >= 1} {
				set C($s) $s
				set SE($s) $s
			}
		}
	}
	if {[sym null $cut]} break
	set H [sym diff $H $cut]
}
unset H
log CE = $CE
report I "CE = $CE"
foreach {t u} [array get SE] {
	log SE $t ($u) = $C($u)
	report I "SE $t ($u) = $C($u)"
}


		
 
section    top
protocol
G—Parser graph.
L(v)—Vertex label.
L(x,y)—Edge label.
CE—All complex entry points.
SE(t)—s is a simple entry and SE(s)=u is its cycle name.
C(u)—All nodes within the simple cycle u.
tail(x,y)—All reduction paths that end in y,x. These interfere with paths containing x,y.
R(path)—All reduction paths. This will be set to the interference order.
seStart(x)—When x is a reduction state that reduces over an SE, this is the added node to actually reduce from.
seTarget(r)—All SEs that can reduce to r.
seExitPaths—Paths that exit an SE.
callConversion(s,t)—The edge s,t has been replaced with a call to t on an edge to this state.

set C(-1) {}
set seExitPaths {}
array set seStart {}
array set tail {}
while {[vprioq dequeue backtracking task est; set est]} {
	set path [setp {A n k u} $task]
	set t [lindex $path end]
	set continuations {}
	<Backtrack one edge until ended>
	<Continue backtracking if not ended>
	if {![llength $continuations]} {
		<Move forward from end of backtrack path to the actual reduced-to state>
		<Catalog completed reduction backtrack with special cases for SEs>
	}
}
<Decide when to hybridise simple cycles if exitting to different CE domains>
set G [sym diff $G [join [array names callConversion] |]]
foreach x [array names callConversion] {unset L($x)}
log add SE incr/init.
<Add SE counter arithmetic>
log Order reductions against interferences.
<Order reductions against interferences>

		
 
section    top
protocol
input t—Current end of the backtrack.
input k—Number of shift edges backtracked over.
output continuations—Nodes that are backtracked into and the new shift count.

sym each s [sym to $G $t] {
	if {[lsearch $L($s,$t) <look#*> ]>=0} {
		lappend continuations $s $k
	} elseif {[lsearch $L($s,$t) {[_A-Z0-9a-z()%]*}]>=0 && $k<$n} {
		lappend continuations $s [expr {$k+1}]
	}
}

		
 
section    top

foreach {s k} $continuations {
	if {[sym contains $CE $t]} {
		set R([concat $path -1]) 0
		<Split reductions over complex entries>
		vprioq enqueue backtracking [list $A [expr {$n-$k}] 0 -1 $z $s]
	} elseif {[info exists SE($t)]} {
		vprioq enqueue backtracking [concat $A $n $k $SE($t) $path $s]
	} else {
		vprioq enqueue backtracking [concat $A $n $k $u $path $s]
	}
}

		
 
section    top

vprioq new forward -id $t 0 $path
set red 0
while {!$red && ![vprioq empty forward]} {
	vprioq dequeue forward fpath
	set x [lindex $fpath end]
	sym each r [sym from $G $x] while {!$red} {
		if {[string equal [lindex $L($x,$r) 0] $A]} {
			if {[info exists callConversion($x,$r)]} {set r $callConversion($x,$r)}
			set path [concat $fpath $r]
			set red 1
		} elseif {[lsearch $L($x,$r) <look#*>]>=0} {
			sym each y [sym to $G $r] {
				if {[lsearch $L($y,$r) <look#*>]>=0} {
					vprioq enqueue forward -id $y 0 [concat $path $r $y]
				}
			}
		}
	}
}
if {!$red && $n>0} {
	error "Missing reduced-to $A state on path: $path"
}

		
 
section    top

if {[sym contains $C($u) $t]} {
	set R([concat [lindex $path 0] -1]) 0
	<Split reductions into simple cycles>
	set seStart($x1) $z
} elseif {$u>=0} {
	lappend seExitPaths $u $path
	if {![info exists seTarget($r)]} {set seTarget($r) {}}
	set seTarget($r) [sym union $seTarget($r) $u]
	log over SE: u=$u PATH=$path
} elseif {[llength $path]>1} {
	set R($path) 0
	lappend tail([lindex $path end],[lindex $path end-1]) $path
	log acyclic: PATH=$path
}

		
 
section    top

set dep {}
foreach path [array names R] {
	set patheq [join $path =]
	set dep [sym union $dep $patheq]
	foreach a [lrange $path 0 end-2] b [lrange $path 1 end-1] {
		if {[info exists tail($a,$b)]} {
			foreach dependent $tail($a,$b) {
				if {[lindex $dependent end]!=[lindex $dependent end-2] && ![string equal $dependent $path]} {
					set dep [sym union $dep $patheq,[join $dependent =]]
				}
			}
		}
	}
}
vprioq new reductions
set removal {}
for {set scc 1} {$scc<=[sym scc $dep]} {incr scc} {
	if {[sym card [sym memb $dep $scc]] > 1} {
		unset -nocomplain P		
		sym each path [sym memb $dep $scc] {
			set q [split $path =]
			if {![info exists P([lindex $q 0])]} {set P([lindex $q 0]) {}}
			set P([lindex $q 0]) [sym union $P([lindex $q 0]) [lindex $q end]]
		}
		sym each path [sym memb $dep $scc] {
			set q [split $path =]
			if {[sym card $P([lindex $q 0])]<=1} {
				sym each neighbour [sym to $dep $path] {
					if {[sym scc $path]==[sym scc $neighbour]} {
						set removal [sym union $removal $path,$neighbour]
					}
				}
			}
		}
		unset -nocomplain P
	}
}
set dep [sym diff $dep $removal]
set dep [sym tsort $dep]
for {set i 1} {![sym empty $dep]} {set dep [sym rest $dep]; incr i} {
	set path [split [sym first $dep] =]
	if {[info exists R($path)]} {set R($path) $i}
	vprioq enqueue reductions $i $path
}

		
 
section    top

proc traceChanges {array subscript op} {
	catch {
		upvar 1 $array A
		switch $op {
			write {log "   " set $array $subscript = $A($subscript)}
			unset {log "   " del $array $subscript}
		}
	}
}
foreach subscript [lsort -dictionary [array names L]] {
	traceChanges L $subscript write
}
trace add variable L {write unset} [namespace current]::traceChanges
trace add variable R {write unset} [namespace current]::traceChanges

		
 
section    top

while {[vprioq dequeue reductions path est; set est]} {
	if {![info exists R($path)] || !$R($path) || [lindex $path end]<0} continue
	set s [lindex $path 0]
	setp {y A n} [sym decode string [sym unnest [lindex  $L($s) [lsearch $L($s) <reduction#*>]]]]
	<Link acyclic reductions to their destinations directly>
	<Link acyclic reductions to their destinations with switches>
}

		
 
section    top

array set callConversion {}
sym each t $CE {
	sym each s [sym to $G $t] {
		set ep $t
		<Convert shift edge into a call edge to a return state>
		log CE=$t CALL=$s RET=$y
	}
}

		
 
section    top

206. Split reductions over complex entries :: The first case covers reductions through complex entry points. The destination of the reduction cannot be handled statickally; instead it will be handled in the call/return of recursive ascent. ψAn and μBm match the discovery of a reduction to the shift on the production in the caller. (This is just restating how the usual recursive ascent works on reductions.)

If Px1xmtsr1rpRit∈CE∧(s, t)=[δ]∈Ei and xj∉EP , the reduction is out of a complex cycle. Let x1=[α=ηψAnζ] .

The edge (s, t) will have been remapped into the edge (s, y)=callConversion (s, t) ; (s, y)=[βpiteγ]∈Ei , add the node y=[ε]∈Vi+1 and the edge (s, y)=[δpite]∈Ei+1 .

After finding or adding node y , find or add another node z=[ψAnk]∈Vi+1 with the edge (y, z)=[μAn]∈Ei+1 , where k is the number of shift and nonterminal edges in x1xjts .

Add the edge (x1, ⊥)=[αω1]∈Ei+1 if it does not yet exist.

Ri+1Ri∪{x1xjt⊥, zysr1rp}−{P} .
Gi+1=(Vi+1, Ei+1, Li+1) .
Vi+1Vi∪{y, z} .
Ei+1Ei∪{(s, y), (y, z), (x1, ⊥)} .


<Add AS switch to return state>
set x1 [lindex $path 0]
<Add accept to a state>
log over CE: TOP=[lrange $path 0 end-1] : CE=$t : (RET=$z : CALL=$y) : BOTTOM=$s

		
 
section    top

207. Split reductions into simple cycles :: The second case covers a reduction through simple entry points back into the cycle. Because there is only one path from the reduction through the entry and back into the cycle of length n , the reduction can be determined, assuming it is into the loop. The cycle counter keeps tracks of how many times the loops was enterred on shifts and then exitted on reductions. Reductions outside the loop are not coverred here: they will be through a complex entry above or asynchronous below.

If Px1x2xmts1s2sprRit∈SE∧tCusjCu , and xj∉EP , the reduction is within a simple cycle. Let x1=[α=ηψAnζ]∈Vi and (sp, r)=[Aθ]∈Ei .

(Because of (3), if t and sp are in a cycle, they are in the same cycle and so too are all the sk .)

Add the node y=[ε]∈Vi+1 and edge (x1, y)=[∇u]∈Ei+1 . Add the node z=[α]∈Vi+1 and edge (y, z)=[κu]∈Ei+1 . Add the edge (y, r)=[κu+αθ]∈Ei+1 .

Let Q={x1w1wmRi} .
Let D={zyx1v1v2vp : x1v1v2vpRi, vjt} .
Let S={zv1v2vp : Six1w1wmtv1v2vpRi, SiP} .
Ri+1RiSDQ .
Gi+1=(Vi+1, Ei+1, Li+1) .
Vi+1Vi∪{y, z} .
Ei+1Ei∪{(x1, y), (y, z), (y, r)} .

(5) The destination in the cycle is unique.
If the cycle has two distinct destinations in the simple cycle of the same production length from the same state, they would have to be the same distance from the loop entry, which means on two different paths. But to get from the loop entry to two separate paths requires a node with two outgoing edges in the same cycle which is contrary to the definition of a simple cycle.

(One possible complication is whether the reduction back tracks through both a simple cycle and complex cycle, since these edit the reduction state in different ways. But by (3) and (4) above, this cannot happen. A reduction that is both acyclic and a simple cycle is handled with sets D and S which continue around the cycle (and states will split), or over the cycle (which will add a switch).)


set decrs {}
foreach b [lrange $path 0 end-2] {
	if {[info exists SE($b)]} {
		lappend decrs [sym nest decr,$u]
	}
}
set x1 [lindex $path 0]
set sp [lindex $path end-1]
set r [lindex $path end]
set y $sn; incr sn
$automata ep: $y := [$automata ep: $x1]
set L($y) {}
set L($x1,$y) $decrs
set z $sn; incr sn
$automata ep: $y := [$automata ep: $x1]
set L($z) $L($x1)
set L($y,$z) [sym nest zn,$u]
set L($y,$r) [concat [sym nest po,$u] $L($x1) [lrange $L($sp,$r) 1 end]]
set G [sym union $G $y $x1,$y $z $y,$z $y,$r]
log into SE: (CHOOSE=$z : DECR=$y) : PATH=$path

		
 
section    top

208. Link acyclic reductions to their destinations directly :: The third case covers a reduction that does not pass through any cycles. In this case, the reduction is directly linked to the destination with no further matching or other semantics. If there are different paths to different destinations, the forward paths must be split so that there is an single unconditional path. In the path contains a simple cycle, it possible to replicate the cycle while splitting the states, but instead a switch will be added, as decribed in the next section.

Switches can also be used even if there is no cycles involved to avoid a combinatoric explosion in the number of state. The current code ignores this risk; if it becomes problematic in actual grammars, this issue will be readdressed.

If Px1x2xmrRi, r≠⊥ , and xj∉EP, j&lt;m and (x2, x1)∈Ei , the reduction is acyclic. Let x1=[α=ηψAnζ]∈Vi and (xm, r)=[Aθ]∈Ei .

(xm can be in EP because the reduction will not be through this node across the cycle entry, but just up to edge