Absimpa — Java class library to construct recursive decent parsers.
Originally the name was Abstract Simple Parsers, though meanwhile the code changed to be much less abstract, but the name sticks.
An Example
A Parser obtains a sequence of token codes (int
values) from a Lexer's Lexer.next() method and, if the sequence
conforms to the structure defined by the parser, transforms it to a value of a
generic type N (think Node).
To create a parser, the following ingredients are needed.
Lexer
The lexer provides two things:
- token codes as
intvalues and - a value of a generic type
Nrepresenting the current token code.
What the N represents and how it is created is up to the
lexer. When using the SimpleLexer, a LeafFactory must be explicitly provided.
NodeFactories
As the nested parsers parse the input and obtain values of type N
from the lexer, they themselves combine a list of N values into a single
N value by using a NodeFactory provided in their
constructor. A parser which naturally obtains only a single N from
its sub-parsers, like ChoiceParser, may not need an explicit node
factory, in which case the identity transformation is the default. A node
factory may return null, as may a parser's parse method.
Parsers
A full parser is built up from several types of parsers:
TokenParser: parses a single token only.RepeatParser: parses the same sub-parser between n and m times where n≤m and both are non-negative integers.SeqParser: parses a given sequence of sub-parsers.ChoiceParser: parses exactly one of a given list of sub-parsers.RecurseParser: is used to build recursive structures, as it is the only parser which does not require its sub-parser(s) to be specified already in the constructor.
A minimal parser example:
// We want to parse a sequence of NUMBER and NAME tokens.
private enum Tokens {
NUMBER, NAME, EOF;
}
public static void main(String[] args) throws Exception {
// Our result shall be a string with the token texts in brackets.
LeafFactory<String, Tokens> tokenToNumber = (lex) -> "[" + lex.currentToken().getText() + "]";
NodeFactory<String> joinTokens = (l) -> String.join("", l);
// The lexer understands two tokens and returns EOF on end of input
SimpleLexer<String, Tokens> lexer = new SimpleLexer<>(tokenToNumber, Tokens.EOF)
.addToken(Tokens.NUMBER, "[+-]?[0-9]+")
.addToken(Tokens.NAME, "[a-zA-Z]+");
// We need a TokenParser for each token
Parser<String> pNumber = new TokenParser<>(Tokens.NUMBER);
Parser<String> pName = new TokenParser<>(Tokens.NAME);
// At each position we allow either one of them, number or name
Parser<String> numberOrName = new ChoiceParser<>(pNumber, pName);
// Our top parser just parses a repetition of the above
Parser<String> parser = new RepeatParser<>(joinTokens, numberOrName, 1, Integer.MAX_VALUE);
lexer.initAnalysis(String.join(" ", args));
String r = parser.parse(lexer);
System.err.println(r);
}