Absimpa — Java class library to construct recursive decent parsers.

Originally the name was Abstract Simple Parsers, though meanwhile the code changed to be much less abstract, but the name sticks.

An Example

A Parser obtains a sequence of token codes (int values) from a Lexer's Lexer.next() method and, if the sequence conforms to the structure defined by the parser, transforms it to a value of a generic type N (think Node).

To create a parser, the following ingredients are needed.

Lexer

The lexer provides two things:

  1. token codes as int values and
  2. a value of a generic type N representing the current token code.

What the N represents and how it is created is up to the lexer. When using the SimpleLexer, a LeafFactory must be explicitly provided.

NodeFactories

As the nested parsers parse the input and obtain values of type N from the lexer, they themselves combine a list of N values into a single N value by using a NodeFactory provided in their constructor. A parser which naturally obtains only a single N from its sub-parsers, like ChoiceParser, may not need an explicit node factory, in which case the identity transformation is the default. A node factory may return null, as may a parser's parse method.

Parsers

A full parser is built up from several types of parsers:

  • TokenParser: parses a single token only.
  • RepeatParser: parses the same sub-parser between n and m times where n≤m and both are non-negative integers.
  • SeqParser: parses a given sequence of sub-parsers.
  • ChoiceParser: parses exactly one of a given list of sub-parsers.
  • RecurseParser: is used to build recursive structures, as it is the only parser which does not require its sub-parser(s) to be specified already in the constructor.

A minimal parser example:

// We want to parse a sequence of NUMBER and NAME tokens.
private enum Tokens {
  NUMBER, NAME, EOF;
}

public static void main(String[] args) throws Exception {
  // Our result shall be a string with the token texts in brackets.
  LeafFactory<String, Tokens> tokenToNumber = (lex) -> "[" + lex.currentToken().getText() + "]";
  NodeFactory<String> joinTokens = (l) -> String.join("", l);

  // The lexer understands two tokens and returns EOF on end of input
  SimpleLexer<String, Tokens> lexer = new SimpleLexer<>(tokenToNumber, Tokens.EOF)
    .addToken(Tokens.NUMBER, "[+-]?[0-9]+")
    .addToken(Tokens.NAME, "[a-zA-Z]+");

  // We need a TokenParser for each token
  Parser<String> pNumber = new TokenParser<>(Tokens.NUMBER);
  Parser<String> pName = new TokenParser<>(Tokens.NAME);

  // At each position we allow either one of them, number or name
  Parser<String> numberOrName = new ChoiceParser<>(pNumber, pName);

  // Our top parser just parses a repetition of the above
  Parser<String> parser = new RepeatParser<>(joinTokens, numberOrName, 1, Integer.MAX_VALUE);

  lexer.initAnalysis(String.join(" ", args));
  String r = parser.parse(lexer);
  System.err.println(r);
}
Packages
Package
Description
Abstract Simple Parser.
An example implementation of a lexer.