DNA.
wyrmcow
Version.
1.0.9
Namespace.
::wyrm::cow
Command.
::wyrm::cow
Language.
tcl
Manpage.
cow (1WY)
Testbase.
Test Script
Test Report
Import.
Package.
wyrmwif
wyrm-dpda
wyrm-expat
wyrm-uri
Export.
Implementation.
wyrmcow.tcl
Package.
wyrmcow.tcl

vW2 Compiler Compiler

Sections.
Parser-generator from vW2 grammars
vW2 Grammar
Grammar Properties
Extract One Level Grammar from Two Level Grammar
Attributes
DFA Parser Generation
LR(k) Parser Generation
Parser Automata
Lexical Processor
Symbol Table
Tree Rewriting
Implementations
Test Base
Make.
Package.
rule $lib/wyrmcow.tcl $source/wyrmcow.tcl "
  rm -f $lib/wyrmcow.tcl
  sed 's/@@VER@@/[control ver]/; /^#line/d' $source/wyrmcow.tcl >$lib/wyrmcow.tcl
"
   

Parser-generator from vW2 grammars

   
top

1 :: Generate parsers from vW2 grammars.

Copyright (C) 2003 SM Ryan

Wyrmwif Tcl extensions. For non-profit uses only, provided this copyright is preserved on all copies, this work may be freely copied, modified, redistributed, compiled, and incorporated in other works. This work is distributed with no warranty of any kind; no author or distributor accepts any responsibility for the consequences of using it, or for whether it serves any particular purpose or works at all, unless he or she says so in writing.

 
section    top

2 :: Does a cow have buddha nature?

MOOOOOOOOOOOOOOO.

 
section    top
3. cow (1WY) ::

NAME

cow — Parser-generator from vW2 grammar.

synopsis

wyrm::cow grammar...

description

Generated Code Commands

<Generate compiled parser code> <Generated command: accept_character> <Generated command: accept_production> <Generated command: allocate> <Generated command: anti_get_value> <Generated command: assign_integer> <Generated command: attribute_wam> <Generated command: call> <Generated command: call_domain> <Generated command: deallocate> <Generated command: decrement_integer> <Generated command: deferred_action> <Generated command: define_glyph> <Generated command: define_integer> <Generated command: define_terminal_class> <Generated command: discard_character> <Generated command: eval_loop> <Generated command: execute> <Generated command: explicit_end> <Generated command: fail> <Generated command: foreign_immediate> <Generated command: foreign_include> <Generated command: foreign_text> <Generated command: get_constant> <Generated command: get_list> <Generated command: get_structure> <Generated command: get_value> <Generated command: get_variable> <Generated command: goto_state> <Generated command: ibbuffer> <Generated command: ibclear> <Generated command: ibdiscard> <Generated command: if_bound> <Generated command: implicit_end> <Generated command: increment_integer> <Generated command: initialise_constant> <Generated command: initialise_memory> <Generated command: initialise_register> <Generated command: input_styles> <Generated command: iqreadahead> <Generated command: iq_maximum> <Generated command: name_class> <Generated command: nop> <Generated command: on_failure> <Generated command: on_success> <Generated command: parser> <Generated command: parse_domain> <Generated command: parse_error> <Generated command: parse_start> <Generated command: parse_state> <Generated command: partial_shift> <Generated command: pass> <Generated command: perfect_hash_entry> <Generated command: perfect_hash_modulus> <Generated command: perfect_hash_multiplier> <Generated command: proceed> <Generated command: psbegin> <Generated command: psclear> <Generated command: psend> <Generated command: pspush> <Generated command: ps_maximum> <Generated command: put_constant> <Generated command: put_list> <Generated command: put_structure> <Generated command: put_unsafe_value> <Generated command: put_value> <Generated command: put_variable> <Generated command: put_void> <Generated command: query_start> <Generated command: recognise_reservable> <Generated command: recognise_symbol> <Generated command: reduce_parse> <Generated command: report> <Generated command: requires> <Generated command: reserved_word_table> <Generated command: retry> <Generated command: retry_me_else> <Generated command: section> <Generated command: set_constant> <Generated command: set_local_value> <Generated command: set_value> <Generated command: set_variable> <Generated command: set_void> <Generated command: shift_lexeme> <Generated command: state_semantics> <Generated command: subquery_start> <Generated command: switch_on_constant> <Generated command: switch_on_structure> <Generated command: switch_on_term> <Generated command: symbol_enum> <Generated command: symbol_reserved> <Generated command: symbol_table_bottom_scope> <Generated command: symbol_table_class> <Generated command: symbol_table_constant> <Generated command: symbol_table_definition> <Generated command: symbol_table_empty_defs> <Generated command: symbol_table_identify> <Generated command: symbol_table_method> <Generated command: symbol_table_interference> <Generated command: symbol_table_new_definition> <Generated command: symbol_table_new_scope> <Generated command: symbol_table_new_table> <Generated command: symbol_table_not_constant> <Generated command: symbol_table_split_defs> <Generated command: term_production> <Generated command: term_vars> <Generated command: transition_character> <Generated command: transition_integer> <Generated command: transition_integer_switch> <Generated command: transition_iqis> <Generated command: transition_jump> <Generated command: translator_prelude> <Generated command: trim_environment> <Generated command: trust> <Generated command: trust_me> <Generated command: try> <Generated command: try_me_else> <Generated command: unify_constant> <Generated command: unify_local_value> <Generated command: unify_value> <Generated command: unify_variable> <Generated command: unify_void> <Generated command: wam_initialisation>
			
#	wyrm-cow.dna - Copyright (C) 2003 SM Ryan.  All rights reserved.

package provide wyrmcow @@VER@@

package require wyrmwif
package require wyrmassoc
package require wyrmexpat
package require wyrmsym
package require wyrmdpda
package require wyrmuri

namespace eval ::wyrm {
	namespace export cow

	proc cow args {
		if {[catch {cow::driver $args} rs]} {
			puts $rs\n\n$::errorInfo
			return "report C [list $rs]"
		} else {
			return $rs
		}
	}
	
	namespace eval cow {
		namespace import ::wyrm::*
		
		proc driver argv {
			variable gen
			parser $argv
			sym grammar [namespace current]::cfparser
			cfparser machine: := cf
			vW1
			attributeAnalysis hr
			<Symbol table class definitions>
			switch [hr type:] {
				2 {createLR(k) hr cfparser}
				3 {createDFA hr cfparser}
			}
			if {[hr compiled:]} {
				<Lexical processors>
			}
			<Report the vW1 grammar>
			if {[hr compiled:]} {
				implementParser cfparser
			}
			[hr implementor:]::implementparser $gen
		}
		<Generate compiled parser code>
		<vW2 grammar parser>
		<Grammar properties>
		<DFA parser generation>
		<LR(k) parser generation>
		<Parser automata>
		<Extract vw1 grammar from vw2>
		<Derive the attribute evaluation machine>
	}
	<cow implementations>
}

		
 
section    top
 
section    top
proc log—Log parser generator activities.
input args—Log message.
io namespace oldclicks—How long since the last message.
io namespace gen—Generated parser code.
			
variable oldclicks [clock clicks -milliseconds]
proc log {args} {
	variable gen
	variable oldclicks
	set newclicks [clock clicks -milliseconds]
	set wallclock [clock format [clock seconds] -format %H:%M:%S]
	set clicks [expr {$newclicks-$oldclicks}]
	if {$clicks<0} {set clicks [expr {~$clicks}]}
	set elapsed [format %d:%02d:%02d.%03d \
			[expr {$clicks/3600000}] \
			[expr {($clicks/60000)%60}] \
			[expr {($clicks/1000)%60}] \
			[expr {($clicks)%1000}] \
	]
	set message [join $args { }]
	set message $wallclock.$elapsed.$message
	set oldclicks $newclicks
puts $message
	append gen "#.$message\n"
}

		
 
section    top
proc report—Error and information message reporter.
input severity—C (critical), E (error), or I (information).
input args—Error message.
io namespace gen—Generated parser code.
			
proc report {severity args} {
	variable gen
	variable depth
	set message [join $args { }]
if {[string equal $severity E]} {
	puts "ERROR $message"
} else {
	puts $message
}
	<Generated command: report>
}

		
 
section    top
			
gen report $severity $message

		
 
section    top
proc gen—Generated code.
input args—Command name and arguments.
io namespace gen—Generated parser code.
io namespace depth—Generated parser code depth.
			
proc gen {args} {
	variable gen
	variable depth
	if {[llength $args]==1 && [llength [lindex $args 0]] > 1} {set args [lindex $args 0]}
	if {[string equal [lindex $args 0] END]} {
		incr depth -2
		append gen "[string repeat { } $depth]\}\n"
	} else {
		append gen [string repeat { } $depth]
		set begin [string equal [lindex $args 0] BEGIN]
		if {$begin} {set args [lrange $args 1 end]}
		set args [setp op $args]
		append gen $op
		foreach arg $args {append gen " [list $arg]"}
		if {$begin} {append gen " \{"; incr depth 2}
		append gen \n
	}
}

		
 
section    top

report I "vW1 Attribute Grammar"
sym each mn [mr::nonterminals] {
	report I "${mn}:: [string map {, {}} [mr $mn]]"
	catch {report I "  [mr properties: $mn]"}
}
sym each pn [hr::nonterminals] {
	report I "$pn:"
	catch {report I "  [hr properties: $pn]"}
	catch {report I "  shift: [hr action: $pn %shift]"} rs
	sym each alt [hr::if $pn := {}] {
		report I "    $alt"
		catch {
			foreach cs [hr cs: $pn $alt] {
				report I "      [displayAttribute $cs]"
			}
		} rs
		catch {
			report I "      reduce: [hr action: $pn $alt]"
		} rs
	}
	catch {report I "  ::= [hr foreign: $pn]"}
}
sym each sym [hr::terminals] {
	if {![string match *symbol $sym]} continue
	if {[catch {report I "$sym = [hr properties: $sym]"}]} {
		report I "$sym = terminal"
	}
	catch {report I "  shift: [hr action: $sym %shift]"} rs
	catch {report I "  characters: [hr character: $sym]"} rs
	catch {
		foreach cs [hr cs: $sym %shift] {
			report I "  [displayAttribute $cs]"
		}
	} rs
	catch {report I "  ::= [hr foreign: $sym]"}
}

		
 
section    top
			
report I "Lexical Attribute Grammar [symr::if name: := scanner] for [symr::if clients: := unknown]"
sym each pn [symr::nonterminals] {
	report I "$pn:"
	catch {report I "  [symr properties: $pn]"}
	catch {report I "  shift: [hr action: $pn %shift]"} rs
	sym each alt [symr::if $pn := {}] {
		report I "    $alt"
		catch {
			foreach cs [symr cs: $pn $alt] {
				report I "      [displayAttribute $cs]"
			}
		} rs
		catch {
			report I "      reduce: [hr action: $pn $alt]"
		} rs
	}
	catch {report I "  ::= [symr foreign: $pn]"}
}
sym each sym [symr::terminals] {
	if {![string match *symbol $sym]} continue
	if {[catch {report I "$sym = [symr properties: $sym]"}]} {
		report I "$sym = terminal"
	}
	catch {report I "  shift: [hr action: $sym %shift]"} rs
	catch {
		foreach cs [symr cs: $sym %shift] {
			report I "  [displayAttribute $cs]"
		}
	} rs
	catch {report I "  ::= [symr foreign: $sym]"}
}

		
   
   

vW2 Grammar

   
top
 
section    top

12 :: Grammar properties are

foreign: production-name
Foreign text alternatives: a union of four element strings: output variables, input variables, language, and foreign code.
nullable: production-name
Whether the production is nullable.
weight: production-name
Production weight (minimum nonempty number of terminal that can be generated.
action: production-name [ alternative ]
Reduce action.
action: production-name [ %shift ]
Shift action.
symbol-name.character
Glyph translations.
cs: production-name alternative
Context sensitive constraints (attributes).
properties: production-name
Description of production properties.
anyright:
If any productions are right (or embedding) recursive.
attributes:
Metanotions declared to be attributes.
compiled:
If a parser was constructed.
everywhere: symbol-name
Everywhere reserved symbols.
finite:
Set of productions with only a finite set of generated terminal.
glyph: glyph-name
Character with this name.
k:
Maximum lookahead.
maxexcept:
Maximum string length of all except...character.
name:
Grammar name.
order:
Produced from orderring.
predicates:
Predicate productions.
reserved: symbol
Reserved symbols in the lexicon.
signatures:
Declared hypernotion signatures.
start:
Start production.
type:
Grammar type in the Chomsky hierarchy, 2 or 3.
ep: production-name
Original production names for created productions. This will be used to create automata ep: properties.
class: symbol|character symbol-set := class-numbers
List each class with its symbols, indexed by its members.
class-number: symbol|character := last-class
Last class number for symbol and character classes.
members: symbol|character class-numbers := symbol-set
List each class with its symbols, indexed by its class numbers.
reserved: reserved-table-index := {spelling reserved-symbol...}
Each unique reserved symbol list.

 
section    top

proc truncate {string} {
	while 1 {
		set i [string first \n $string]
		if {$i==0} {
			set string [string range $string 1 end]
		} else {
			break
		}
	}
	if {$i>0} {incr i -1; set string [string range $string 0 $i]...}
	if {[string length $string]>40} {
		set string [string range $string 0 39]...
	}
	return $string
}

proc parser vw2grammar {
	sym grammar [namespace current]::hr
		hr k: := 2
		hr type: := 3
		hr name: := hyperrules
		hr start: := %start
		hr conflicts: := {}
		hr foreigns: := {}
		hr symtab: := {}
		hr implementor: := ::wyrm::model
	sym grammar [namespace current]::mr
		mr k: := 2
		mr type: := 3
		mr name: := metarules
	sym grammar [namespace current]::rew
	oav bundle [namespace current]::outp
	log Begin parse input.
	<Initialise glyph names before parsing the grammar>
	set grammarhtml {}
	array set header {1 "" 2 "" 3 "" 4 "" 5 "" 6 "" n 0}
	set chunknumber 0
	set ps {}
	set state rule
	set curr %start
	set first 1
	set input ""
	foreach chunk $vw2grammar {
		incr chunknumber
		set token [wyrm::expat::create]
		wyrm::expat::resume $token [
			exec $::env(SHELL) -c {tidy --indent 0 --output-xml 1 --quiet 1; exit 0} << $chunk 2> /dev/null
		]
		set ts 0
		foreach piece [wyrm::expat::end $token] {
			set what [assoc get $piece %what]
			vassoc put piece %chunk $chunknumber
			switch $what {
				start {
					set name [assoc get $piece %name]
					assoc get piece -exact est class c
					switch -glob [lindex $ts end]/$name {
						0/body {
							lappend ts body
						}
						0/* {
							;
						}
						*/cow* {
							lappend ts $name
						}
						{*/h[1-6]} {
							set i [string index $name end]
							set header($i) ""
							set headr(n) $i
							lappend grammarhtml $piece
							lappend ts $name
						}
						default {
							lappend grammarhtml $piece
							lappend ts $name
						}
					}
				}
				end {
					switch -glob [lindex $ts end] {
						0 {
							;
						}
						cow* {
							set ts [lrange $ts 0 end-1]
						}
						{*/h[1-6]} {
							set i [string index $name end]
							set header($i) ""
							set headr(n) [expr {$i-1}]
							lappend grammarhtml $piece
							set ts [lrange $ts 0 end-1]
						}
						default {
							lappend grammarhtml $piece
							set ts [lrange $ts 0 end-1]
						}
					}
				}
				content {
					set data [assoc get piece %data]
					switch -glob [lindex $ts end] {
						0 {
							;
						}
						{*/h[1-6]} {
							set i [string index $name end]
							append header($i) $data
							lappend grammarhtml $piece
						}
						code - cowhn - cowsep {
							setp {ps state input} [dpda $ps $state $input[assoc get piece %data] {
								foreign-chunk
									{((?~code=.*)),((?~N=.*)),<foreigncode>(?~ch=[^][]+)}
									{($code$ch),($N),<foreigncode>}
								skip-newline
									{<(?~state=.*)>[\n\r]}
									{<[incr line; set state]>}
								skip-blanks
									{<(?~state=.*)>[ \t]+}
									{<$state>}
								skip-comments
									{<(?~state=.*)>{[^{}]*}}
									{<$state>}
								end-of-input
									{<rule>$}
									{<quit>}
								start-a-rule
									{<rule>}
									{(rulekind),<notion>}
								
								<Tree rewriting rules>
								<Metarules>
								<Foreign rules>
								<Hyperrules>
								<Property rules>

								unknown-text
									{(.*),(#NS),<rulekind>(?~rest=[^.]*)[.]}
									{<[report E \[$line\] could not identify the text: [truncate $rest]; concat rule]>}
								unknown-rule
									{(.*),((?~N=.*)),<rulekind>(?~rest=[^.]*)[.]}
									{<[report E \[$line\] could not identify the rule: "[truncate [concat $N ## $rest]]"; concat rule]>}
								find-full-stop-at-end-of-rule
									{(.*),<endrule>[.]}
									{<rule>}
								error-at-end-rule
									{(metarule),<endrule>(?~rest=[^.]+)[.]}
									{<[report E \[$curr\] expected '.' to end metarule: [truncate $rest]; concat rule]>}
								error-at-end-rule
									{((?~what=.*)),<endrule>(?~rest=[^.]+)[.]}
									{<[report E \[$curr\] expected '.' to end $what: [truncate $rest]; concat rule]>}
								error-at-end-rule
									{(metarule),<endrule>(?~rest=.*)}
									{<[report E \[$curr\] expected '.' to end metarule: [truncate $rest]; concat quit]>}
								error-at-end-rule
									{((?~what=.*)),<endrule>(?~rest=.*)}
									{<[report E \[$curr\] expected '.' to end $what: [truncate $rest]; concat quit]>}
									
								<Parse the next hypernotion, metanotion, or symbol>

								skip-comments
									{<(?~state=.*)>\{[^{}]*\}}
									{<$state>}
							}]
						}
						default {
							lappend grammarhtml $piece
						}
					}
				}
			}
		}
	}
	<Ensure foreign and hyperrules are disjoint>
	<Symbol table postparse processing>
	<Make aleph definitions of undefined attributes>
	log Complete parse input.
	foreach {character name} [array get glyphName] {
		hr glyph: $name := $character
	}
}

		
 
section    top

identify-a-property-rule
	{(proto),((?~N=.*)),<rulekind>=}
	{($N),<propertyrule>}

<Attribute property rule>
<Conflicts property rule>
<Glyph property rule>
<Grammar type property rule>
<Implementor property rule>
<Include property rule>
<Lookahead property rule>
<Name property rule>
<Reserved symbol property rule>
<Signature property rule>
<Start property rule>
<Stylesheet property rule>
<Symbol table>

unidentified-property
	{((?~P=.*)),<propertyrule>[^.]*}
	{(property),<[report E \[$line\] unknown property: $P; concat endrule]>}

		
 
section    top

start-property
	{(start),<propertyrule>}
	{(startproperty),<notion>}
start-property
	{((proto|meta|hyper)),((?~N=.*)),<startproperty>}
	{(start property),<[set first 0; hr %start := $N,%end; concat endrule]>}

		
 
section    top

k-property
	{(k),<propertyrule>}
	{(),<kproperty>}
k-property
	{((?~K=.*)),<kproperty>(?~k=\d+)}
	{($K$k),<kproperty>}
k-property
	{((?~K=.*)),<kproperty>}
	{(k property),<[hr k: := $K; concat endrule]>}

		
 
section    top

name-property
	{(name),<propertyrule>}
	{(nameproperty),<notion>}
name-property
	{((proto|meta|hyper)),((?~N=.*)),<nameproperty>}
	{(name property),<[hr name: := $N; mr name: := $N_metarules; concat endrule]>}

		
 
section    top

grammar-type-property
	{(type),<propertyrule>}
	{(),<typeproperty>}
grammar-type-property
	{((?~K=.*)),<typeproperty>(?~k=\d+)}
	{($K$k),<typeproperty>}
grammar-type-property
	{((?~K=2|3)),<typeproperty>}
	{(type property),<[hr type: := $K; concat endrule]>}
grammar-type-property
	{((?~K=.*)),<typeproperty>}
	{(type property),<[report E \[$line\] grammar type not 2 or 3: $K; concat endrule]>}

		
 
section    top

include-property
	{(include),<propertyrule>'(?~uri=([^']|'')*)'}
	{($uri),(),<endincludeuri>}
end-include-property
	{((?~garbage=.*)),<endincludeuri>(?~text=[^.])}
	{($garbage$text),<endincludeuri>}	
end-include-property
	{((?~uri=.*)),((?~garbage=.*)),<endincludeuri>[.]}
	{<rule>[
		if {[string length $garbage]} {report E unexpected text after include uri: [truncate $garbage]}
		uri get [string map {'' '} $uri]
	]}

include-property
	{(include),<propertyrule>\[}
	{(%include),(),(1),<foreigncode>}
included-foreign-text
	{(%include),((?~code=.*)),(1),<foreigncode>\]}
	{(include property),<[
		gen foreign_include $code
		concat endrule
	]>}

include-property
	{(include),<propertyrule>}
	{(include property),<[
		report E \[$line\] url or foreign code expected after include
		concat endrule
	]>}

		
 
section    top
 
section    top

implementor-property
	{(implementor),<propertyrule>'(?~N=([^']|'')+)'}
	{(implementor property),<[hr implementor: := [string map {'' '} $N]; concat endrule]>}

		
 
section    top

identify-a-metarule
	{(meta),((?~N=.*)),<rulekind>:\s*:}
	{($N),<[set curr $N; concat metarule]>}
identify-a-bad-metarule
	{(hyper|proto),(#NS),<rulekind>:\s*:[^.]*}
	{(metarule),<[
		report E \[$line\] missing metanotion before '::'
		concat endrule
	]>}
metarule
	{<metarule>}
	{(),(mrmember)<notion>}
metarule-member
	{((?~rule=.*)),((proto|meta|hyper)),((?~HN=.*)),<mrmember>}
	{([sym union $rule $HN]),<mrmemberseparator>}
metarule-alternative
	{<mrmemberseparator>[;|]}
	{(mrmember),<notion>}
metarule-completed
	{((?~MN=.*)),((?~rule=.*)),<mrmemberseparator>}
	{(metarule),<[mr $MN +:= $rule; concat endrule]>}

		
 
section    top

identify-a-bad-hyperrule
	{(hyper|proto),(#NS),<rulekind>:\s*[^.]*}
	{(hyperrule),<[
		report E \[$line\] missing hypernotion before ':'
		concat endrule
	]>}
identify-a-hyperrule
	{((proto|meta|hyper)),((?~N=.*)),<rulekind>:}
	{($N),<[set curr $N; concat hyperrule]>}
hyperrule
	{<hyperrule>}
	{(),(),(hrmember)<notion>}
hyperrule-equal-member
	{((proto|meta|hyper)),((?~HN=.*)),<hrmember>=\s*=}
	{($HN%eq%),(hrposteqmember),<notion>}
hyperrule-equal-member
	{((proto|meta|hyper)),((?~HN=.*)),<hrmember>=}
	{($HN%eq%),(hrposteqmember),<notion>}
hyperrule-not-equal-member
	{((proto|meta|hyper)),((?~HN=.*)),<hrmember>/\s*=}
	{($HN%ne%),(hrposteqmember),<notion>}
hyperrule-not-equal-member
	{((proto|meta|hyper)),((?~HN=.*)),<hrmember>­}
	{($HN%ne%),(hrposteqmember),<notion>}
hyperrule-post-equal-member
	{((?~HN1=.*)),((proto|meta|hyper)),((?~HN2=.*)),<hrposteqmember>}
	{(hyper),($HN1$HN2),<hrmember>}
hyperrule-member
	{((?~alternative=.*)),((proto|meta|hyper)),((?~HN=.*)),<hrmember>}
	{([
		if {[sym null $alternative]} {set HN} else {sym concat $alternative $HN}
	]),<hrmemberseparator>}
hyperrule-member
	{<hrmemberseparator>,}
	{(hrmember),<notion>}
hyperrule-alternative
	{((?~rule=.*)),((?~alternative=.*)),<hrmemberseparator>[;|]}
	{([sym union $rule $alternative]),(),(hrmember),<notion>}
hyperrule-completed
	{((?~HN=[^%].*)),((?~rule=.*)),((?~alternative=.*)),<hrmemberseparator>}
	{([if {[string match *symbol $HN]} {concat symbol rule} else {concat hyperrule}]),<[
		hr $HN +:= [sym union $rule $alternative]
		if {$first && ![string match *symbol $HN]} {hr %start := $HN,%end; set first 0}
		concat endrule
	]>}

		
 
section    top

24 :: A notion is here used to refer to a sequence of small and/or large marks, and things that become small marks like literals. The calling sequence for this dpda state is

rule-comment
rule-match
(next-state),<notion>
...
rule-comment
(proto|meta|hyper),(large-and-small-marks)<next-state>
rule-substitution

'proto' indicates only small marks were parsed (or no marks were parsed), 'meta' only large marks, 'meta' only large marks, and 'hyper' for a mixture. The symbol is the concatenation of the marks, or #NS if no marks.

 
section    top

start-a-protonotion
	{<notion>(?~N=[a-z()]+)}
	{(proto),($N),<notioni>}
start-a-protonotion
	{<notion>(?~N=[#'])}
	{(proto),(),<notioni>$N}
start-a-hypernotion
	{<notion>(?~N=\[)}
	{(hyper),(),<notioni>$N}
start-a-metanotion
	{<notion>(?~N=[A-Z0-9]+)}
	{(meta),($N),<notioni>}
empty-notion
	{((?~state=.*)),<notion>}
	{(proto),(#NS),<$state>}
<Literal glyphs in a hypernotion>
<Decimal numbers in a hypernotion>
continue-a-protonotion
	{(proto),((?~PN=.*)),<notioni>(?~N=[a-z()]+)}
	{(proto),($PN$N),<notioni>}
continue-a-metanotion
	{(meta),((?~MN=.*)),<notioni>(?~N=[A-Z0-9]+)}
	{(meta),($MN$N),<notioni>}
continue-a-hypernotion
	{(.*),((?~HN=.*)),<notioni>(?~N=[a-z()A-Z0-9]+)}
	{(hyper),($HN$N),<notioni>}
end-a-notion
	{((?~state=.*)),((?~kind=.*)),((?~N=.*)),<notioni>}
	{($kind),([sym encode string $N]),<$state>}

		
 
section    top

variable defaultGlyphName
array set defaultGlyphName {
	<Default glyph translations>
}

		
 
section    top

\n newline		\r return		\t tab
" " space		! exclaim		"\"" quote		# hash
$ dollar		% percent		& ampersand		'' apostrophe
( leftparen		) rightparen	* asterisk		+ plus
, comma			- dash			. fullstop		/ slash
0 zero			1 one			2 two			3 three
4 four			5 five			6 six			7 seven
8 eight			9 nine			: colon			; semicolon
< lessthan		= equals		> greaterthan	? query
@ at			A largea		B largeb		C largec
D larged		E largee		F largef		G largeg
H largeh		I largei		J largej		K largek
L largel		M largem		N largen		O largeo
P largep		Q largeq		R larger		S larges
T larget		U largeu		V largev		W largew
X largex		Y largey		Z largez		[ leftbracket
\\ backslash	] rightbracket	^ circumflex	_ underline
@ at			a lettera		b letterb		c letterc
d letterd		e lettere		f letterf		g letterg
h letterh		i letteri		j letterj		k letterk
l letterl		m letterm		n lettern		o lettero
p letterp		q letterq		r letterr		s letters
t lettert		u letteru		v letterv		w letterw
x letterx		y lettery		z letterz		\{ leftbrace
| verticalbar	\} rightbrace	~ tilde

		
 
section    top

variable defaultGlyphName
array set glyphName [array get defaultGlyphName]

		
 
section    top

glyph-property
	{(glyph),<propertyrule>'(?~char=''|[^'])'}
	{($char),(glyphproperty),<notion>}
glyph-property
	{(glyph),<propertyrule>}
	{<[report E \[$line\] missing literal after 'glyph=' property; concat quit]>}
glyph-property
	{(.*),(proto),(#NS),<glyphproperty>}
	{(glyph property),<[report E \[$line\] missing glyphname; concat endrule]>}
glyph-property
	{((?~char=.*)),(proto),((?~N=.*)),<glyphproperty>}
	{(glyph property),<[set glyphName($char) $N; concat endrule]>}

		
 
section    top

literal-glyphs
	{<notioni>'(?~string=([^']|'')*)'}
	{<notioni>[string map [array get glyphName] $string]}
literal-glyphs-error
	{<notioni>'}
	{<[report E \[$curr\] missing close quote ('); concat notioni]>}

		
 
section    top

NUMBER
	{<notioni>#(?~num=[0-9 \t\n]+)}
	{<notioni>number([string map {
		0 zero 1 one 2 two 3 three 4 four
		5 five 6 six 7 seven 8 eight 9 nine
		" " "" "\t" "" "\n" ""
	} $num])}
missing-NUMBER
	{<notioni>#}
	{<[report E \[$curr\] missing digits after "'#'"; concat notioni]>}

		
 
section    top

32. Foreign rules :: Foreign text is a way to enter programming code written in another language into the text of a grammar. Attributes can be made available for input and output to the text. As far as the context free grammar is concerned, each foreign rule is interpretted as an empty production; the foreign text is then evaluated during a reduction.

The foreign text must contain a balanced number of brackets; no escapes are available, nor are any kinds of string or comments interpretted to hide unbalanced brackets. If the foreign code must used unbalanced brackets, it must do so outside the text of the grammar.

foreign-rule ::= hypernotion : foreign-texts.

foreign-texts ::= foreign-text | foreign-texts; foreign-text

foreign-text ::= foreign-output foreign-input language [foreign-code] | language immediate [foreign-code]

foreign-output ::= [ (metanotion...) ]

foreign-input ::= [ metanotion... ]

language ::= [ protonotion ]

The language cannot end in 'immediate'

foreign-code ::= [ foreign-chunk... ]

foreign-chunk ::= any-characters-except-[-and-] | [foreign-code]

Normally in the cf parser, foreign texts are evaluated with the attributes after parsing. If the small marks 'immediate' are included after the language, means the foreign text is evaluated immediately on reduction within the parser itself. In a scanner, all foreign texts are immediate, whether marked so or not. Because immediate texts are evaluated during the parse, they can alter the lexical interpretation and other aspects of parser so that it can accept languages that it could not otherwise.

For example, C has well known ambiguity with typedef names. A C grammar can use immediate foreign texts in the parser and scanners with a rudimentary symbol table to remove this ambiguity.

NEST block:
	left brace symbol, push new typedef level,
		NEST declarations into NEST1, NEST1 statements,
		right brace symbol, pop off typedef level.
NEST typedef: typedef symbol, TAG symbol, make typedef.

IDENTIFIER symbol: TAG symbol, relabel if typedef.
TAG symbol: letter, letter or digit sequence option.

include = [
	typedef struct TopLevel TopLevel;
	struct TopLevel {int depth; char *tag; TopLevel *under;};
	TopLevel *topLevel = 0; int topLevelDepth = 0;
].
push new typedef level: immediate [
	topLevelDepth++;
].
pop new typedef level: immediate [
	topLevelDepth--;
	while (topLevel && topLevel->depth>topLevelDepth) {
		TopLevel *u = topLevel->under; free(topLevel); topLevel = u;
	}
].
relabel if typedef: immediate [
	TopLevel *t; for (t=topLevel; t; t=t->under) {
		if (strcmp(t->tag,Tcl_GetString(bufferContents()))==0) {
			PS->override = true;
			PS->reserved = 0;
			PS->symbol = TYPENAMEsymbol;
			PS->nameclass = 0;
			break
		}
	}
]
make typedef: immediate [
	TopLevel *t; t = malloc(sizeof(TopLevel));
	t->depth = topLevelDepth; t->tag = bufferString(lastlexeme.name,0);
	t->under = topLevel; topLeve = t;
]

An immediate foreign text cannot have explicit output or input variables. This is because variables are propagated by the attribute evalator after the parse is completed, but immediate texts are evaluated before the parse is completed. Immediate texts which need to communicate need to establish some protocol with global variables.

(The language string cannot end in 'immediate' unless there is another 'immediate' after it. ximmediat e[...] is an immediate text in language x; ximmediat e immediate[...] is an immediate text in language ximmediate.)


identify-a-foreign-rule
	{((proto|meta|hyper)),((?~HN=.*)),<rulekind>:\s*(?~la=(\([A-Z0-9 \t\n]*\))?[A-Z0-9 \t\n]*[a-z() \t\n]*\[)}
	{($HN),(),<[set curr $N; concat foreignrule]>$la}
foreign-text-variables
	{<foreignrule>(\((?~output=([A-Z0-9 \t\n]*))\))?(?~input=[A-Z0-9 \t\n]*)(?~language=[a-z() \t\n]*)}
	{([regsub -all \\s+ $output {}]),([regsub -all \\s+ $input {}]),([regsub -all \\s+ $language {}]),(),(0),<foreigncode>}
foreign-code-begin
	{(0)<foreigncode>\[}
	{(1)<foreigncode>}
foreign-code-begin
	{((?~HN=.*)),((?~alts=.*)),((?~output=.*)),((?~input=.*)),((language=.*)),((?~code=.*)),(0),<foreigncode>}
	{(foreign rule),<[report E missing foreign-code '\[...\]': $HN; concat endrule]>}
foreign-code-end
	{((?~alts=.*)),((?~output=.*)),((?~input=.*)),((language=[^%]*)),((?~code=.*)),(1),<foreigncode>\]}
	{([
		if {![string length $output]} {set output -}
		if {![string length $input]} {set input -}
		if {![string length $language]} {set language -}
		if {[string match *immediate $language]} {
			if {![string equal $output$input --]} {
				report E \[$line\] immediate foreign text cannot have variables: ($output)$input
			}
		}
		sym union $alts [sym encode string [list $output $input $language $code]]
	]),<foreigntextor>}
foreign-code-nest
	{((?~code=.*)),((?~N=.*)),<foreigncode>\[}
	{($code\[),([expr {$N+1}]),<foreigncode>}
foreign-code-unnest
	{((?~code=.*)),((?~N=.*)),<foreigncode>\]}
	{($code\]),([expr {$N-1}]),<foreigncode>}
foreign-text-OR-or-END
	{<foreigntextor>[;|]}
	{<foreignrule>}
foreign-text-completed
	{((?~HN=.*)),((?~alts=.*)),<foreigntextor>}
	{(foreign rule),<[
		hr foreigns: +:= $HN
		hr foreign: $HN +:= $alts
		concat endrule
	]>}

		
 
section    top

sym each hn [sym intersect [hr foreigns:] [hr::nonterminals]] {
	report E production cannot be both a hyperrule and a foreign-rule: $hn
	hr::delete foreign: $hn
}
sym each hn [hr foreigns:] {hr $hn := "#NS"}

		
   
   

Grammar Properties

   
top

34 :: The notations used herein are the usual notations from formal linguistics.

A context-free grammar G=(N, Σ, P, S) where N is the set of nonterminal, Σ is the set of terminal symbols, P is the set of productions, and S is the start symbol.
The transitive-reflexive closure is X* and the transitive closure is X+ .
The vocabulary VN∪Σ
P is finite subset, PN×V* written in the form A→α where AN and α∈V* .
The derive relations is αAβ⇒αδβ where A→δ is in P .
The language is L (G)={w∈Σ* : S*w} .
The empty string is ε .
a, b, c, ⋯ denote terminals of Σ .
A, B, C, ⋯ denote nonterminals of N .
X, Y, Z denote symbols of V .
u, v, ⋯, z denote strings of Σ* .
α, β, γ, ⋯ denote strings of V* .
The set concatenation operator is S1 ||S2={w1 w2 : w1S1w2S2} .

This is at odds with the vW2 notation which uses upper case letters as metanotion, lower case strings as nonterminals or (ending in 'symbol') terminals, and concatenation with an explicit ','.

 
section    top

proc grammarProperties {grammar} {
	log Begin grammar properties.
	<Compute grammar properties>
	log Order the vocabulary from terminal to the root production.
	<Order the vocabulary from terminal to the root production>
	log Remove useless productions.
	<Remove useless productions>
	log Remove inaccessible productions.
	<Remove inaccessible productions>
	log Finiteness analysis.
	<Finiteness analysis>
	log Cycle analysis.
	<Cycle analysis>
	log Format the properties of the grammar for reporting.
	<Format the properties of the grammar for reporting>
	$grammar order: := $order
	log Completed grammar properties.
}

		
 
section    top

${grammar}::property weight: {grammar set} {
	if {[sym card $set]==1 && [sym length $set]==1} {return 0}
	sym each alt $set {
		set w 0
		sym enum sym $alt {
			incr w [$grammar weight: [sym first $sym]]
		}
		if {$w==0} {
			;
		} elseif {![info exists W]} {
			set W $w
		} elseif {$W>$w} {
			set W $w
		}
	}
	if {![info exists W]} {set W 0}
	return $W
}

${grammar}::property nullable: {grammar set} {
	if {[sym card $set]==1 && [sym length $set]==1} {error "undefined nullable: $set"}
	set nu 0
	sym each alt $set while {!$nu} {
		set nu 1
		sym enum sym $alt while {$nu} {
			set nu [$grammar nullable: [sym first $sym]]
		}
	}
	return $nu
}

${grammar}::property empty: {grammar set} {
	expr {[$grammar weight: $set]==0}
}

		
 
section    top

37. Order the vocabulary from terminal to the root production :: The nodes of the produce graph are the vocabulary symbols V and there is an edge (A, X) if A→αXβ . Strongly connected components (scc) of produce are then mutually recursive productions. This allows the grammar to be decomposed into nested subgrammars, with fewer simultaneously nonterminals to deal with at each analysis step. For example with

SAB
ABb
AC
BAa
BD
CDd
Ce
DCc
Df
First the nested grammar with N={C, D} and Σ={c, d, e, f} can be analyzed, then the grammar with N={A, B} and Σ={C, D, a, b} , and finally the grammar with N={S} and Σ={A, B} .

This code does the following steps:

(1) Creates the produce graph.
(2) Factors it by strongly connected regions; mutually recursive productions form an scc, and a single node in the factorred graph.
(3) Topological sorts the graph. (Because the order for sym tsort cannot be specified, the graph is first inverted so that tsort sorts from leaves to roots instead roots to leaves.) The sort produces the total order X1, X2, ⋯, Xn such that if Xj*Xi then ij .
(4) Each element of the factorred graph orderring replaces all the symbols in corresponding scc of the produce graph.

Whether the symbol is a terminal or nonterminals is added to the order as a convenience to later passes that need to know.


set produce {}
sym each pn [${grammar}::nonterminals] {
	sym each alt [${grammar}::if $pn := {}] {
		sym enum member $alt {
			set produce [sym union $produce $pn,$member]
		}
	}
}
set order {}
sym enum forder [sym tsort [sym factor [sym invert $produce]]] {
	set scc [sym member $produce [sym scc $produce $forder]]
	lappend order $scc [sym contains [${grammar}::nonterminals] $scc]
}

		
 
section    top

set neworder {}; $grammar useless: := {}
foreach {scc N} $order {
	sym each sym $scc {
		set useful($sym) [expr {!$N}]
	}
}
foreach {scc N} $order {
	set isUseful 0
	sym each pn $scc while {!$isUseful} {
		if {$useful($pn)} {
			set isUseful 1
		} else {
			sym each alt [$grammar $pn] while {!$isUseful} {
				set isUseful 1
				sym enum member $alt while {$isUseful} {
					set isUseful $useful($member)
				}
			}
		}
	}
	<Prune useless productions of a nonterminal>
}
set order $neworder

		
 
section    top

sym each pn $scc {
	set useful($pn) $isUseful
}
sym each pn $scc {
	if {$isUseful && $N} {
		set newprod {}
		sym each alt [$grammar $pn] {
			set keep 1
			sym enum member $alt while {$keep} {
				set keep $useful($member)
			}
			if {$keep} {set newprod [sym union $newprod $alt]}
		}
		$grammar $pn := $newprod
	} elseif {$N} {
		report I useless production: $pn
		${grammar}::delete $pn
	}
}
if {$isUseful} {
	lappend neworder $scc $N
} else {
	$grammar useless: +:=  $scc
}

		
 
section    top

set accessible {}
$grammar inaccessible: := {}
foreach {scc N} $order {
	sym each sym $scc {
		if {!$N} {
			set accessible [sym union $accessible $sym]
		}
	}
}
vprioq new work -id %start 0 %start
while {[vprioq dequeue work v est; set est]} {
	if {![sym contains [${grammar}::nonterminals] $v]} continue
	set accessible [sym union $accessible $v]
	sym each w [sym from $produce $v] {
		vprioq enqueue work -id $w 0 $w
	}
}
set neworder {}
foreach {scc N} $order {
	if {[sym contains $accessible $scc]} {
		lappend neworder $scc $N
	}
}
set order $neworder
sym each sym [sym diff [${grammar}::nonterminals] $accessible] {
	$grammar inaccessible: +:= $sym
	report I inaccessible production: $sym
}

		
 
section    top

foreach {scc N} $order {
	set sccfinite 1
	sym each v $scc while {$sccfinite} {
		sym each w [sym from $produce $v] while {$sccfinite} {
			if {[sym scc $produce $v]==[sym scc $produce $w]} {
				set sccfinite 0
			} elseif {[info exists finite($w)] && !$finite($w)} {
				set sccfinite 0
			}
		}
	}
	sym each v $scc {
		set finite($v) $sccfinite
	}
}
set U {}
foreach {v sccfinite} [array get finite] {
	if {$sccfinite} {set U [sym union $U $v]}
}
$grammar finite: := $U

		
 
section    top

42. Cycle analysis :: Lt Bradshaw: 'Take him down to hindquarters for cycle-analysis.' Young Guy, Motor Detective

This section examines derivations of productions to discover

(1) Whether the symbol is nullable, if any derivation X+ε exists.
(2) Whether the symbol is empty, if all derivations are X+ε .
(3) The WEIGHT of a symbol.
(4) Whether this is a type 2 grammar, if there is any derivation X+wXy∧⊤w|&gt;0∧⊤y|&gt;0 .

From Ancona, Dodero, Gianuzzi, and Morgavi Efficient Construction of LR(k) States and Tables TOPLAS January 1991, WEIGHT (X)=minw| where X*ww≠ε , WEIGHT (X)=0 if all possible derivations are empty.

The basic notion for each nested grammar find the minimal derivations A→αBζ→αβCδζ*η where η=(ΣA∪{A}) , and ΣA are the terminals of the nested grammar. There is a trap here for grammars such as

ABb|c
BAa|Bd|e
that it has the derivations
Ac
ABbAab
ABbeb
ABbBdbAadb
but also
ABbBdb*Bd+b
and never derive a string with just A or a terminal. (In terms of the produce graph, it is necessary to discover all possible minimal cycles, all cycles without subcycles, from and to A in the scc with A and B .)

This is achieved by annnotating each symbol in a derivation with the set of productions used to derive that symbol. If while trying to derive from a nonterminal, a production was already used to get it, using the same production again would be a subcycle.

A→(B, {ABb})b→(B, {ABb, BBd})db
so that attempting to use BBd again would be detected and is rejected. The only derivations permitted from this are
(B, {ABb, BBd})dbedb
or
(B, {ABb, BBd})db→(A, {ABb, BBd, BAa})adb

If any nonrecursive derivation is nonnull, the weight of the recursive derivations cannot be less than that. (Actually unless they are cyclic, they will have to be greater.) So recursive derivations (those who have symbols in the same produce scc), only need to be considerred if all nonrecursive derivations are empty, or it still needs to be determined if the grammar is type 2 or 3. And since the weight is the minimum length, once the grammar type is determined, heavy derivations can be abandonned.

Each derivation is an labeld string.

String label 'from': the nonterminal symbol derived from.
Symbol i label 'prods': set of productions that derived this symbol.


set type [${grammar}::if type: := 3]
if {$type>=3} {
	$grammar type: 3
} else {
	$grammar type: 2
}
$grammar anyright: := 0
foreach {scc N} $order {
	<Initialise the derivations for a nested grammar>
	while {[vprioq dequeue s derivation est; set est]} {
		&