Back

The parser data generator

From a grammar file ( parser data generator input ) generates parser data used by parser

Generator invocation

ParserDataGenerator parserDataGenerator = ParserDataGenerator.getInstance();
final ParseResult parseResult = parserDataGenerator.generate(
    grammarInputFile,
    htmlParserDataOutputFile,
    txtLexGenLogOutputFile,
    parserDataOutputFile,
    matchMgrPackageName,
    grammarDefTreeTextFile,
    grammarTreeTextFile,
    listener);

grammarInputFile
a grammar text file, for example the definitive input grammar for generator
htmlParserDataOutputFile
in wich file output parser data in html form, example of resulting for definitive  input grammar for generator
txtLexGenLogOutputFile
log of lex data generation in text file, null if no output, content example.
parserDataOutputFile
in wich file output parser data that will be used by parser
matchMgrPackageName
package name where match manager class are, for example the generator input grammar parser have its match manager class in "net.sf.parser4j.generator.service.match"

If null the match management mechanism will be disabled in the parser data. The parser using this parser data will not invoke match manager class.
grammarDefTreeTextFile
stores the resulting AST of parsing, content example
grammarTreeTextFile
where store basic rules internaly created by generator, content example
It is the final grammar obtained from input grammar text file. This final grammar is the result of match transformation and white space insertion.
listener
parser event listener

Internaly the parser data generator breaks rules in basic rules, for example:

the rule
A : B | C D
is broken to:
A : B | A_1
A_1 : C D

Basic rules are concatenation, aternative, and empty.


match transformation

example #1: (g1.txt)
for grammar
%;
Start : A ;
A : B {MatchB} | C ;
B : 'b' ;
C : 'c' ;
gives
Start :(concat) A
A :(alternative) A_0 | C
A_0 :(concat) B { MatchB }
B :(terminal char) 'b'
C :(terminal char) 'c'

example #2: (g2.txt)
for grammar
%;
Start : A ; {MatchA}
A : B {MatchB} | C ;
B : 'b' ;
C : 'c' ;
gives
Start :(concat) A { MatchA }
A :(alternative) A_0 | C
A_0 :(concat) B { MatchB }
B :(terminal char) 'b'
C :(terminal char) 'c'

example #3: (g3.txt)
for grammar
%;
Start : A ;
A : B | C ;
B : 'b' ;
C : 'c' ;
gives
Start :(concat) A
A :(alternative) B | C
B :(terminal char) 'b'
C :(terminal char) 'c'

example #4: (g4.txt)
for grammar
%;
Start : A ;
A : | B ;
B : 'b' ;
gives
Start :(concat) A
A :(alternative) A_0 | B
A_0 :(empty)
B :(terminal char) 'b'

example #5: (g5.txt)
for grammar
%;
Start : A ;
A : {M1} | B ;
B : 'b' ;
gives
Start :(concat) A
A :(alternative) A_0 | B
A_0 :(empty) { M1 }
B :(terminal char) 'b'

white space insertion

White space are not inserted in token definition rules

example #1: (g1w.txt)
for grammar
%;
ws : ' ' ;
%ws;
Start : A ;
A : B {MatchB} | C ;
B : 'b' ;
C : 'c' ;
gives
ws :(terminal char) ' '
Start :(concat) A & ws (as white space)
A :(alternative) A_0 | C
A_0 :(concat) B { MatchB }
B :(concat) ws (as white space) & _B
_B :(terminal char) 'b'
C :(concat) ws (as white space) & _C
_C :(terminal char) 'c'

example #2: (g2w.txt)
for grammar
%;
ws : ' ' ;
%ws;
Start : A ; {MatchA}
A : B {MatchB} | C ;
B : 'b' ;
C : 'c' ;
gives
ws :(terminal char) ' '
Start :(concat) A & ws (as white space) { MatchA }
A :(alternative) A_0 | C
A_0 :(concat) B { MatchB }
B :(concat) ws (as white space) & _B
_B :(terminal char) 'b'
C :(concat) ws (as white space) & _C
_C :(terminal char) 'c'

example #3: (g2w_1.txt)
for grammar
%;
ws : ' ' ;
%ws;
Start : A ; {MatchA}
A : 'b' {MatchB} | 'c' ;
gives
ws :(terminal char) ' '
Start :(concat) A & ws (as white space) { MatchA }
A :(alternative) A_0 | A_1
A_0 :(concat) ws (as white space) & _A_0 { MatchB }
A_1 :(concat) ws (as white space) & _A_1
_A_0 :(terminal char) 'b'
_A_1 :(terminal char) 'c'

example #4: (g3w.txt)
for grammar
%;
ws : ' ' ;
%ws;
Start : A ;
A : B | C ;
B : 'b' ;
C : 'c' ;
gives
ws :(terminal char) ' '
Start :(concat) A & ws (as white space)
A :(alternative) B | C
B :(concat) ws (as white space) & _B
C :(concat) ws (as white space) & _C
_B :(terminal char) 'b'
_C :(terminal char) 'c'

example #5: (g4w.txt)
for grammar
%;
ws : ' ' ;
%ws;
Start : A ;
A : | B ;
B : 'b' ;
gives
ws :(terminal char) ' '
Start :(concat) A & ws (as white space)
A :(alternative) A_0 | B
A_0 :(empty)
B :(concat) ws (as white space) & _B
_B :(terminal char) 'b'

example #6: (g5w.txt)
for grammar
%;
ws : ' ' ;
%ws;
Start : A ;
A : {M1} | B ;
B : 'b' ;
gives
ws :(terminal char) ' '
Start :(concat) A & ws (as white space)
A :(alternative) A_0 | B
A_0 :(empty) { M1 }
B :(concat) ws (as white space) & _B
_B :(terminal char) 'b'


equivalence rule like "A : B ;"

example #1: (g9.txt)
for grammar
xxx;
%;
ws : ' ' ;
[barname] : name;
name : 'x';
%ws;
Start : barname ;
gives
ws :(terminal char) ' '
[barname] :(concat) name { xxx }
name :(terminal char) 'x'
Start :(concat) ws (as white space) & barname & ws (as white space) { xxx }

Since grammar of input for generator is not ambiguous the HasAmbiguityParserException must never be throws, else it is a bug and ambigous parse result is logged and can be analized.



© 2008-2009, parser4j