wyrm::cow grammar...
The vW2 grammars accepted are a subset of two level van Wijngarten grammar, with additions for lexical analysis and parser generation controls.
grammar ::= [ rule ] ...
rule ::= property-rule | metarule | hyperrule | foreign-rule
Property rules allow characteristics of the grammar and the generated parser to be specified.
property-rule ::= start-property | lookahead-property | name-property | grammar-type-property | attribute-property | glyph-property | signature-property | enum-property | reserved-symbol
Force a metanotion to be regarded as an attribute.
attribute-property ::= attribute = metanotion [ |metanotion ] ...
The conflicts property forces the LR(k) generator to issue a shift or reduction in particular shift/reduce and reduce/reduce conflicts. If conflict does have a matching conflict rule, the parser construction is aborted.
conflicts-property ::= conflicts = hypernotion : hypernotion [ | hypernotion ] ....
The glyph property allows glyphs to be renamed when literals are translated into small letters.
glyph-property ::= glyph = 'glyph' small-marks.
In the Chomsky hierarchy, a type 2 language is a context free language, and a type 3 is regular language, a language which has a grammar with no embedding recursive productions. The actual grammars can be type 1 (context sensitive) or type 0 (Turing machine equivalent), but the parsers created by cow are for either deterministic type 2 grammars (LR(k)) or type 3 (DFA). The grammar analysis can discover if it is type 2 or 3, or it can explicitly declared type 2. If a grammar is explicitly declared type 3 but is actually type 2, this rule is ignored.
grammar-type-property ::= type = 2. | type = 3.
Select the implementation langiage for the output of cow. The choices are
implementor-property ::= implementor = model. | implementor = wyrmwif.
The include property inserts the contents of URL after the full stop (.) of the include rule, or it adds foreign text that is added to the generated parser.
include-property ::= include = 'uri'. | include = [foreign-code].
The lookahead property specifies the maximum lookahead used to create the LR(k) parser. The lookahead is ignored for type 3 languages.
lookahead-property ::= k = digit....
Grammars can be named; the name appears in reports and generated parsers.
name-property ::= name = protonotion.
It is convenient to be able to define symbols that look like other symbols, but with specific spellings, the reserved symbols of the language. For example, the word 'int' in C is reserved, while 'integer' is not. While it is conceivable to define reserved symbols directly in the grammar, that adds a number of problems (ambiguity, knowing where the symbol ends without explicit %end, etc), so that it is easier to define the reserved symbol as a special spelling of another symbol.
Reserved symbol are defined with the spelling and the signature of the similar symbol it looks like. The spelling refers to the concatenation of accepted characters of the similar symbol. The start set is modified to remove the reserved symbol and add the similar symbol. Then when the similar symbol is accepted, its spelling is compared to set of reserved symbols for that state.
By default reserved symbols are searched everytime the similar symbol is accepted, whether the grammar expects the reserved symbol or not; this matches the usually expectation in most languages that a reserved symbol is reserved everywhere, even where not syntactically valid. The reserved symbol can also be defined to only be recognised if the parser needs it; this allows languages, such as PL/I, to use the same spelling as a reserved and normal symbol as syntactically permitted. This is indicated by the 'as needed' suffix.
Reserved symbols cannot have an attribute. Because the symbol is already matched to a specific spelling, this is usually superfluous. It is possible to reserve multiple spellings as one symbol; it is not possible to preserve which spelling it was. If this information is required, multiple symbols must be used instead.
reserved-symbol ::= protonotion symbol = 'characters' hypernotion symbol [ everywhere ] . | protonotion symbol = 'characters' hypernotion symbol as needed.
Declare a hypernotion signature.
signature-property ::= signature = hypernotion [ ,hypernotion ] ...
The start property identifies the start rule in the grammar. If no start rule is specified, the first hyperrule is the start rule.
start-property ::= start = hypernotion.
The stylesheet property initialises the style translations for input queue. This can be further modified while scanners are running. The style sheet indicates how XML tags, HTML tags and classes, or RTF contexts corresponds to styles of characters in the *character and except*character symbols. If a input character context does not correspond to any style, the character is silently discarded.
For example, an HTML source file may have source code in <CODE> elements and comments in <I>.
stylesheet-property ::= stylesheet = style-definition....
style-definition ::= style-name: style-match [ ,style-match... ]
style-match ::= xml-match | html-match | rtf-match | escape-match
xml-match ::= xml = xml-tag [ |xml-tag... ]
xml-tag ::= hypernotion
html-match ::= html = html-tag [ |html-tag... ]
html-tag ::= protonotion | protonotion/protonotion | /protonotion
rtf-match ::= rtf = rtf-attribute [ ,rtf-attribute... ]
rtf-attribute ::= protonotion font | plain | bold | italic | underline | outline | subscript | superscript | baseline | left | right | center | justified | digits red digits green digits blue foreground | digits red digits green digits blue background | digits size | protonotion style
escape-match ::= plain = [ 'string' [ ,'string' ] ]
XML style matches are based solely on the innermost tag. An xml-match matches if any of the listed tags is the innermost tag. HTML style matches are based either the innermost tag (protonotion), the innermost tag and its class attribute (protonotion/protonotion), or just its class attribute (/protonotion). An html-match matches if any of the listed tags and classes is the innermost tag. RTF style matches are based on the typographical and font feeatures. An rtf-match if all the attributes match simulaneously.
All characters of a plain file are assigned some style and accepted. The file starts in a plain style; transition to another style can be made with a start escape sequence of characters, and back to plain with an optional stop sequence. A newline character cannot be part of a start or stop sequence.
metarule ::= metanotion :: hypernotion [ |hypernotion ] ....
hyperrule ::= hypernotion : [ hyperalternatives ] .
hyperalternatives ::= hyperalternative | hyperalternatives; hyperalternative
hyperalternative ::= member | hyperalternative, member
member ::= hypernotion | hypernotion==hypernotion | hypernotion/=hypernotion | hypernotion=hypernotion | hypernotionhypernotion
rewrite-rule ::= operator input-pattern: output-pattern | rewrite-rule; output-pattern
operator ::= [ protonotion :: ]
input-pattern ::= hypernotion input-subtree-patterns
input-subtree-patterns ::= <> | <input-node-patterns> | <tail-pattern> | <input-node-patterns,tail-pattern>
input-node-patterns ::= input-node-pattern | input-node-patterns,input-node-pattern
input-node-pattern ::= hypernotion input-subtree-patterns | <metanotion>
tail-pattern ::= metanotion
output-pattern ::= [ predicate, ] ... hypernotion output-subtree-patterns [ ,predicate ] ...
output-subtree-patterns ::= <> | <output-node-patterns> | <tail-pattern> | <output-node-patterns,tail-pattern>
output-node-patterns ::= output-node-pattern | output-node-patterns,output-node-pattern
output-node-pattern ::= operator hypernotion output-subtree-patterns | <operator metanotion>
predicate ::= hypernotion
If different rewrite rules have the same operator and input-pattern, they can be combined as alternatives in one rule. One rule or many have the same meaning. The input and output pattern hypernotions are reduced to signatures and can include metanotion constructors the same way as production; the hypernotion signatures can be declared with a signature property rule. The input pattern matches a tree node if the signatures are the same and the variables unify. Predicates, if any, are evaluated and can unify more variables. If the input pattern matches and all the predicates succeed, the output pattern is evaluated. The resulting output tree replaces the input to the caller.
The empty operator rewrites are only called when a node is created in a production rule, or when a node is created in an output-pattern without an operator. Non-empty operators are only called on subtrees from an output-pattern. If the operator is nonempty, it is only matches if called with the same operator. If a tree does not match any rule, with an empty or nonempty operator, it is returned unchanged.
The input-pattern matches if the signatures are the same, metanotion variables unify, and the subtree pattern matches. The node must have at least as many children as node-patterns; it can have more only if there is is a tail-pattern. A node-pattern which is hypernotion with a subsubtree pattern recursively matches the child node. If the node-pattern is <metanotion>, it matches any single node. A tail pattern matches zero or more child nodes after the enumerated nodes. All metanotion with the same spellings in the input-pattern hypernotions, output-pattern hypernotions, and predicates are unified to the same value in accordance with the uniform replacement rule (URR). Metanotions used as node-patterns are a separate namespace, and metanotions used as tail-patterns are a third separate namespace. Node-pattern and tail-patterns are subject to URR each their own namespaces; a metanotion with the same spelling in a different namespace is unrelated and does have to be (and in fact will not be) unified to the same values.
All predicates have to succeed. In the process they can define additional metanotion variables.
If the input-pattern matches and all predicates succeed, the output-pattern constructs a replacement tree; the output-pattern always succeeds. A new tree is constructed for each hypernotion output-node-pattern, and these with the unanalyzed <metanotion> node-pattern and tail-patterns are combined as children for the next level of constructed trees. When a node-pattern hypernotion is given with an empty operator, that node as created can be rewritten. If it is given a nonempty operator, that node can be rewritten by a matching operator. That can recursively trigger more rewrite rules and alter any or all of the parse tree.
The input-pattern can match one node, the node and its children, its children and some or all grandchildren, as deeply as necessary. The output tree replaces the entire matched tree; it can flatten or deepen the top of the tree; it can move child nodes around as desired and even discard them.
The node-pattern and tail-pattern metanotions are not defined by metarules, even if there is a metarule for that metanotion. They are defined by their position in the patterns.
Rewrite rules are powerful enough to implement translation and compilation: even interpretation and evaluation.
Foreign text is a way to enter programming code written in another language into the text of a grammar. Attributes can be made available for input and output to the text. As far as the context free grammar is concerned, each foreign rule is interpretted as an empty production; the foreign text is then evaluated during a reduction.
The foreign text must contain a balanced number of brackets; no escapes are available, nor are any kinds of string or comments interpretted to hide unbalanced brackets. If the foreign code must used unbalanced brackets, it must do so outside the text of the grammar.
foreign-rule ::= hypernotion : foreign-texts.
foreign-texts ::= foreign-text | foreign-texts; foreign-text
foreign-text ::= foreign-output foreign-input language [foreign-code] | language immediate [foreign-code]
foreign-output ::= [ (metanotion...) ]
foreign-input ::= [ metanotion... ]
language ::= [ protonotion ]
foreign-code ::= [ foreign-chunk... ]
foreign-chunk ::= any-characters-except-[-and-] | [foreign-code]
Normally in the cf parser, foreign texts are evaluated with the attributes after parsing. If the small marks 'immediate' are included after the language, means the foreign text is evaluated immediately on reduction within the parser itself. In a scanner, all foreign texts are immediate, whether marked so or not. Because immediate texts are evaluated during the parse, they can alter the lexical interpretation and other aspects of parser so that it can accept languages that it could not otherwise.
For example, C has well known ambiguity with typedef names. A C grammar can use immediate foreign texts in the parser and scanners with a rudimentary symbol table to remove this ambiguity.
NEST block:
left brace symbol, push new typedef level,
NEST declarations into NEST1, NEST1 statements,
right brace symbol, pop off typedef level.
NEST typedef: typedef symbol, TAG symbol, make typedef.
IDENTIFIER symbol: TAG symbol, relabel if typedef.
TAG symbol: letter, letter or digit sequence option.
include = [
typedef struct TopLevel TopLevel;
struct TopLevel {int depth; char *tag; TopLevel *under;};
TopLevel *topLevel = 0; int topLevelDepth = 0;
].
push new typedef level: immediate [
topLevelDepth++;
].
pop new typedef level: immediate [
topLevelDepth--;
while (topLevel && topLevel->depth>topLevelDepth) {
TopLevel *u = topLevel->under; free(topLevel); topLevel = u;
}
].
relabel if typedef: immediate [
TopLevel *t; for (t=topLevel; t; t=t->under) {
if (strcmp(t->tag,Tcl_GetString(bufferContents()))==0) {
PS->override = true;
PS->reserved = 0;
PS->symbol = TYPENAMEsymbol;
PS->nameclass = 0;
break
}
}
]
make typedef: immediate [
TopLevel *t; t = malloc(sizeof(TopLevel));
t->depth = topLevelDepth; t->tag = bufferString(lastlexeme.name,0);
t->under = topLevel; topLeve = t;
]
An immediate foreign text cannot have explicit output or input variables. This is because variables are propagated by the attribute evalator after the parse is completed, but immediate texts are evaluated before the parse is completed. Immediate texts which need to communicate need to establish some protocol with global variables.
(The language string cannot end in 'immediate' unless there is another 'immediate' after it. ximmediat e[...] is an immediate text in language x; ximmediat e immediate[...] is an immediate text in language ximmediate.)
hypernotion ::= small-marks | large-marks | small-marks hypernotion | large-marks hypernotion
symbol ::= letter s symbol letter y symbol letter m symbol letter b symbol letter p symbol letter l symbol | small-marks hypernotion | large-marks hypernotion
protonotion ::= small-marks
metanotion ::= large-marks
vW2 hypernotions can contain literal glyphs written in single quotes. The glyphs are translated into small marks that are the glyph's name. From cow's point of view, there is no distinction between the literal glyphs and the small marks composing their name.
small-marks ::= ' [ glyphs ] ...'
glyph ::= any-single-character-except-' | ''
Default names are provided for the printable ascii glyphs (code 32 through 126). These names can be overridden with the glyph property.
\n newline \r return \t tab " " space ! exclaim "\"" quote # hash $ dollar % percent & ampersand '' apostrophe ( leftparen ) rightparen * asterisk + plus , comma - dash . fullstop / slash 0 zero 1 one 2 two 3 three 4 four 5 five 6 six 7 seven 8 eight 9 nine : colon ; semicolon < lessthan = equals > greaterthan ? query @ at A largea B largeb C largec D larged E largee F largef G largeg H largeh I largei J largej K largek L largel M largem N largen O largeo P largep Q largeq R larger S larges T larget U largeu V largev W largew X largex Y largey Z largez [ leftbracket \\ backslash ] rightbracket ^ circumflex _ underline @ at a lettera b letterb c letterc d letterd e lettere f letterf g letterg h letterh i letteri j letterj k letterk l letterl m letterm n lettern o lettero p letterp q letterq r letterr s letters t lettert u letteru v letterv w letterw x letterx y lettery z letterz \{ leftbrace | verticalbar \} rightbrace ~ tilde
vW2 hypernotions can contain decimal numbers which are translated to small marks. From cow's point of view, there is no distinction between the literal glyphs and the small marks composing their name.
small-marks ::= #digit-glyph...
digit-glyph ::= 0|1|2|3|4|5|6|7|8|9
The string #digit-glyph... is translated into the small marks number(digit-name...)
The output from the parser generator is syntactical a Tcl script, but none of the commands in the script are defined. The caller evaluates this script in a context where these commands are defined. The intention is the script might be examined for its simple textual elegance, a veritable work of art, or more practically the context can define Tcl commands so that generated parser can execute from the Tcl script, or the context can define commands that in turn generate C or some other code which can then be compiled into automata code.
The generated commands are quite specific, and should be translatable easily into most imperative language. Because it is in Tcl, the translating commands can also be used as macro language.
accept_character
accept_production production-index
allocate num-perm
anti_get_value X.i X.j
assign_integer variable-name value
attribute_wam name {wam-maching}
call L.o.proc-name/n num-args R.r
call_domain state en
deallocate
decrement_integer variable-name
deferred_action nested-action {semantics}
define_glyph name characters
define_integer variable-name initial-value
define_terminal_class symbol|character classes class-symbols
discard_character
eval_loop {instructions}
execute L.o.proc-name/n num-args
explicit_end
fail
foreign_immediate language foreign-text
foreign_include foreign-code
foreign_text language num-outputs num-inputs variable-names foreign-text
Symbol table predicates fall into two categories: those created by symbol table mechanisms for use in the grammar, and those partially written by the grammar writer for use in the symbol table mechanism. Those created by symbol table are added to the grammar as foreign text rules with the language '%symbol'. This code intercepts these foreign text definitions from the foreign_text and splits them into individual instructions.
get_constant C.o.c A.i
get_constant X.i
get_constant C.o.f/n X.i
get_value X.n A.i
get_value Y.n A.i
get_variable X.n A.i
get_variable Y.n A.i
goto_state state sequential
ibbuffer
ibclear
ibdiscard
if_bound A.i
implicit_end {semantics}
increment_integer variable-name
initialise_constant C.o string arity
initialise_memory zone.o tag.value
initialise_register register tag.value
input_styles style [ |style ] ... style-sheet
iqreadahead automata n
iq_maximum maximum-lookahead
name_class symbol nameclass
nop
on_failure X.i X.j
on_success X.i
parser class name {parser}
parse_domain domain if-start {parser-state...}
parse_error productions expected-symbols
parse_start state
parse_state domain state {transition...}
partial_shift
pass
perfect_hash_entry offset spelling symbol
perfect_hash_modulus table-size
perfect_hash_multiplier offset factor...
proceed
psbegin
psclear
psend
pspush nested-action n
ps_maximum maximum-depth
put_constant C.o.c A.i
put_list X.i
put_unsafe_value C.o.f/n X.i
put_unsafe_value Y.n A.i
put_value X.n A.i
put_value Y.n A.i
put_variable X.n A.i
put_variable Y.n A.i
put_void A.i
query_start L.query-address {variable X.i...}
recognise_reservable symbol reserved-table-index name-class filters
recognise_symbol symbol name-class filters
reduce_parse C.o.P/1 num-elems
report severity message
requires iq_multiple
requires iq_scanner
requires iq_single
requires ib
requires ps_fixed
requires ps_partial
requires ps_total
reserved_word_table table-index {symbol-definitions}
retry L.o R.r
retry_me_else R.o
section L.o {script}
set_constant C.o.c
set_local_value Y.n
set_value X.n
set_value Y.n
set_variable X.n
set_variable Y.n
set_void num-cells
shift_lexeme C.o.label/1 attributed
state_semantics {semantics...}
subquery_start queryname L.query-address {variable X.i...}
switch_on_constant {C.o.c L.code...}
switch_on_structure {C.o.f/n L.code...}
switch_on_term L.var L.const L.list L.structure
symbol_enum symbol-kind symbol
symbol_reserved spelling symbol
symbol_table_bottom_scope SYMTAB A.1 A.2 A.3
symbol_table_class immediate-class {} immediate-class immediate-class class-number { class-property... }
symbol_table_class grammar-class {variable...} <TERM,signature,<VAR,variable>...> C.offset.signature/n class-number { class-property... }
symbol_table_constant SYMTAB A.1 A.2
symbol_table_definition immediate|grammar symbol-table-name { object-class-definition... }
symbol_table_empty_defs SYMTAB A.1
symbol_table_identify SYMTAB class A.1 A.2 A.3
symbol_table_method method call {variable...} {variable...}
method ::= initialiser | replacessiblingOBJECT | joinssiblingOBJECT | replacesancestorOBJECT | joinsancestorOBJECT | matchesOBJECT | constantNAME
immediate-call ::= hypernotion
grammar-call ::= {L.query-offset {variable Xi...}}
symbol_table_interference distinct|conflicting {class-number...}
symbol_table_new_definition SYMTAB A.1 A.2 A.3 A.4
symbol_table_new_scope SYMTAB A.1 A.2 A.3
symbol_table_new_table SYMTAB A.1
symbol_table_not_constant SYMTAB A.1
symbol_table_split_defs SYMTAB A.1 A.2 A.3
term_production functor arity+1
term_vars functor/arity {var...}
transition_character symbol match style {{character...}...}
transition_integer integer-test variable-name eq-values {semantics...}
transition_integer_switch variable-name {semantics...}
transition_iqis offset iq-test classes class-symbols {semantics}
transition_jump {semantics}
translator_prelude
trim_environment num-perm
retry L.o
trust_me
try L.o num-args R.r
try_me_else R.o num-args
unify_constant C.o.c X.i
unify_value X.i
unify_value X.i
unify_value Y.i
unify_variable X.i
unify_variable Y.i
unify_void num-variables
wam_initialisation {initialisations}