The
parser data generator
uses the grammar described in a text file to generate parse data used by
parser.
It is possible to define an
ambiguous
grammar.
The input grammar text file must be utf-8 encoded.
The grammar file contains definition for lexical analysis and syntax
analysis: whitespace and token management.
The grammar used to generate parser data has its own syntax described
here. It is a kind of BNF with the following add-ons:
"Start" non terminal:
- must be defined, matching start means matching all the grammar defined.
- must not be a token
- must not be referenced in all other rules
A
non terminal is for lexical analysis if it is declared as a token or it
is referenced by a lexical rule. A lexical rule is a definition for a
non terminal token or a definition for a non terminal whose is referenced by another lexical rule.
A non terminal can not be shared between lexical and syntaxic rules.
A : B C ;
A is B and C concatenation
A : B C D;
A is B,C, and D
concatenation
A : B 'a' ;
A is B and terminal
character 'a' concatenation
A : B | C ;
A is B or C
A : B | C | D;
A is B,C, or D
A : B | C D ;
A is B or concatenation of
C and D
A : B ; {MatchX}
A is equivalent to B and
match management is done by class MatchX
A : ; { MatchX }
A is empty and match
management is done by class MatchX
A : B { M1 } | C D {
M2 } ; { M3 }
A : { M1 } | B C {
M2 } ; { M3 }
A: B ( C | D ; )
A is concatenation of B
and of C or D
A : B ( C | D ; {
MatchXX } ) { MatchYY}
the class MatchXX manages
match of C or D
the class MatchYY manages match of concatenation of B and of C or D
it is not possible to write
A
: B {MatchXX} ;
character
A : 'a'
non terminal A defined as
terminal character 'a'
A : '\''
non terminal A defined as
terminal apostroph character
A : '\\'
non terminal A defined as
terminal back slash character
A : '\x41'
non terminal A defined as
terminal character of code 41 in hexadecimal notation ( ascii range )
A : '\u2145'
non terminal A defined as
terminal character of code 41 in hexadecimal notation ( utf 16 range )
A : '\n'
new line character (0x0A)
A: `\r"
cariage return character (0x0D)
A: '\t'
horizontal tabulation character (0X09)
string
A : "hello"
non terminal A defined as
terminal character string "hello"
character
value can be in form \n, \r, \t, \x<hex digit><hex
digit>, or \u<hex
digit><hex digit><hex
digit><hex digit> as for
character terminal
character class
A : [a-z]
non terminal A defined as
one character in set 'a' through 'z'
A : [abcd]
non terminal A defined as
one character in set 'a', 'b', 'c', and 'd'
A : [*/-+]
non terminal A defined as
one character in set '*', '-', and '+'
the '\' before the '-' means that '-' is a character of the set ( not a
range separator )
A : [()\]{}]
non terminal A defined as
one character in set '(', ')', ']', '{' and '}'
the '\' before the ']' means that ']' is a character of the set ( not
the character class ending bracket)
A : [_a-z]
non terminal A defined as
one character in set '_' and 'a' through 'z'
character
value can be in form \n, \r, \t, \x<hex digit><hex
digit>, or \u<hex
digit><hex digit><hex
digit><hex digit> as for
character terminal