parser_builder #

Lightweight template-based parser build system. Simple prototyping. Comfortable debugging. Effective developing.

Version: 1.0.0

Early release version (not all built-in common buildres are implemented but can be used without them).
It is under development, but you can already play around. An example of a working JSON parser is included.

At the moment, the implementation of useful built-in common parser builders is underway.

The build system is already implemented and ready to use and you can try it out in action. Rewrite your existing parser and feel the difference!
Attention, the generated combinations of parsers are very small and very efficient.
Ask questions if something is not clear.

First advantage #

Very simple and clear build and code generation system
Templates-based (visually intuitive) definition of parser builders
Parser declaration based on combinations of parsers
Allows to build parsers to parse any kind of data (binary data, tokens, etc.)
Support for generating parsers into functions or into statements (inlined parser code)
The performance of the generated parsers is quite high
The generated parsers does not wrap the values of the parse results
The parser builder generates fully strongly typed parsers
Small size of the embedded runtime code (about 11 Kb) to support the work of the parser
The size of generated parser rules starts from 350 bytes (without runtime code)
The generated source code of the parsers is human-friendly as if you wrote it by hand
Very handy debugging and setting breakpoints to any place of parsing
An elegant way to implement your own tracing which can easily be turned off
Very flexible error handling system (with support for nested errors)
Built-in error preprocessing procedures (grouping and flattening errors)
Fully customizable (according to your needs) error reporting procedures
Error messages can be easily localized (translated into another language) before being output
Includes high-performance, most common built-in parser builders
Support for 32 bit Unicode characters out of the box (no need to worry about that)
Included built-in simple script for fast building of parsers

Second advantage #

Another advantage is that the build system is very simple and straightforward. Any programmer will understand it without much difficulty.
This means that an ordinary programmer can keep this software up to date.
You do not need to be a man of seven spans in the forehead for this.
The author of this software is also not such and is a simple person.
It's just a simple but handy thing.
Use it and don't worry about it stopping working.
This software is already so simple that it couldn't be easier.

Included parser builders #

built-in:

Expected (not tested yet)
Nested (not fully tested)

multi:

sequence:

string:

Built-in semantic actions #

Character class (additional semantic actions for character ranges) #

Projects using parser builder #

fast_csv

How to start write your parser? #

Take look at this very simple example of hex color parser builder:

import 'package:parser_builder/bytes.dart';
import 'package:parser_builder/combinator.dart';
import 'package:parser_builder/fast_build.dart';
import 'package:parser_builder/parser_builder.dart';
import 'package:parser_builder/sequence.dart';
import 'package:parser_builder/char_class.dart';

import 'hex_color_parser_helper.dart';

Future<void> main(List<String> args) async {
  final context = Context();
  final filename = 'example/hex_color_parser.g.dart';
  await fastBuild(context, [_parse], filename,
      partOf: 'hex_color_parser.dart', publish: {'parseString': _parse});
}

const _hexColor = Named(
    '_hexColor',
    Preceded(
        Tag('#'),
        Map3(
            _hexPrimary,
            _hexPrimary,
            _hexPrimary,
            ExpressionAction<Color>(
                ['r', 'g', 'b'], 'Color({{r}}, {{g}}, {{b}})'))));

const _hexPrimary = Named(
    '_hexPrimary',
    Map1(TakeWhileMN(2, 2, CharClass('[0-9A-Fa-f]')),
        ExpressionAction<int>(['x'], 'int.parse({{x}}, radix: 16)')));

const _parse = Named('_parse', _hexColor);

Function fastBuild performs the following operations:

Building specified parsers
Combining of the parser code from different parts (header + code + footer)
Write code to file
Code formatting

The rest of the code includes the following elements:

Parser combintors declarations

The generated source code can be found here:
https://github.com/mezoni/parser_builder/blob/master/example/hex_color_parser.g.dart

To get started, you may to copy 3 files:
https://github.com/mezoni/parser_builder/blob/master/example/hex_color_parser_builder.dart
https://github.com/mezoni/parser_builder/blob/master/example/hex_color_parser_helper.dart
https://github.com/mezoni/parser_builder/blob/master/example/hex_color_parser.dart

Rename them as you need.
The file hex_color_parser_builder.dart is parser builder script. Place it in the "tool" directory.
This is where you will declare and build your parser.

The file hex_color_parser.dart is parser script. This will be the public part of your parser.
You can modify it as you wish.

The file hex_color_parser_helper.dart is parser dependency script (as an example).
This file is not particularly required, but for convenience, you can place various functions and data structures in it, which can be referenced from the builder (the data that the parser needs). This data does not have to be in this file.

The file hex_color_parser.g.dart is generated by your parser builder.

When static metaprogramming appears in Dart, then some of these operations will not have to be performed and parsers will be generated on the fly, through just one macro annotation.
Well, of course, you will have to write a small macro for building (with code like in the main function).

Example of parser declaration #

const _comma = Named('_comma', Terminated(Tag(','), _ws), [_inline]);

const _eof = Named('_eof', Eof<String>());

const _escaped = Named('_escaped', Alt([_escapeSeq, _escapeHex]));

const _escapeHex = Named(
    '_escapeHex',
    Map2(Tag('u'), TakeWhileMN(4, 4, CharClass('[0-9a-fA-F]')),
        ExprTransformer<int>(['_', 's'], '_toHexValue({{s}})')),
    [_inline]);

const _escapeSeq = EscapeSequence({
  0x22: 0x22,
  0x2f: 0x2f,
  0x5c: 0x5c,
  0x62: 0x08,
  0x66: 0x0c,
  0x6e: 0x0a,
  0x72: 0x0d,
  0x74: 0x09
});

const _false = Named('_false', Value(false, Tag('false')), [_inline]);

const _inline = '@pragma(\'vm:prefer-inline\')';

Example of generated code #

Color? _hexColor(State<String> state) {
  Color? $0;
  final source = state.source;
  final $pos = state.pos;
  state.ok = state.pos < source.length && source.codeUnitAt(state.pos) == 35;
  if (state.ok) {
    state.pos += 1;
  } else if (state.log) {
    state.error = ErrExpected.tag(state.pos, const Tag('#'));
  }
  if (state.ok) {
    final $pos1 = state.pos;
    int? $1;
    $1 = _hexPrimary(state);
    if (state.ok) {
      int? $2;
      $2 = _hexPrimary(state);
      if (state.ok) {
        int? $3;
        $3 = _hexPrimary(state);
        if (state.ok) {
          final v1 = $1!;
          final v2 = $2!;
          final v3 = $3!;
          $0 = Color(v1, v2, v3);
        }
      }
    }
    if (!state.ok) {
      state.pos = $pos1;
    }
  }
  if (!state.ok) {
    $0 = null;
    state.pos = $pos;
  }
  return $0;
}

This code was generated from this declaration:

const _hexColor = Named(
    '_hexColor',
    Preceded(
        Tag('#'),
        Map3(
            _hexPrimary,
            _hexPrimary,
            _hexPrimary,
            ExpressionAction<Color>(
                ['r', 'g', 'b'], 'Color({{r}}, {{g}}, {{b}})'))));

Inlining code example #

Declaration of _json parser.

const _eof = Eof();

const _json = Named('_json', Delimited(_ws, _value, _eof));

The Eof parser was inlined because it was not named.
Inlined means that it was generated without a function declaration for it (only as statements).

dynamic _json(State<String> state) {
  dynamic $0;
  final source = state.source;
  final $pos = state.pos;
  _ws(state);
  if (state.ok) {
    dynamic $2;
    $2 = _value(state);
    if (state.ok) {
      state.ok = state.pos >= source.length;
      if (!state.ok) {
        state.error = ErrExpected.eof(state.pos);
      }
      if (state.ok) {
        $0 = $2;
      }
    }
  }
  if (!state.ok) {
    state.pos = $pos;
  }
  return $0;
}

This code was generated from this declaration:

const _json = Named('_json', Delimited(_ws, _value, _eof));

Example of error reporting #

Unhandled exception:
FormatException:
line 1, column 1: Expected: [, {, ", string, number, false, null, true
  ╷
1 │ 123.`
  │ ^
  ╵
line 1, column 1: Malformed number
  ╷
1 │ 123.`
  │ ^^^^^
  ╵
line 1, column 5: Unexpected: '`'
  ╷
1 │ 123.`
  │     ^
  ╵

Such an error will be generated, for example, by such a parser:

const _number =
    Named('_number', Malformed('number', Terminated(Number(), _ws)));

This parser (Malformed) is not required to know anything about the data format, its task is to correctly handle parsing errors.
If parsing eventually fails, then the appropriate error post-processing procedure translates all nested errors correctly into other errors, and the output will be such an error message.
During parsing, errors are generated on the fly, but they are not translated on the fly, but stored only in a form convenient for the parser (in fact, as they are).

How to declare a parser builder #

Declaring your own parser builder (if required) is very simple. The process is very simple.
Let's take a look at an existing parser builder and assume it doesn't exist and you need to create one just like it.
This is an implementation of the well-known parsing expression called optional (aka ?).

part of '../../multi.dart';

class Many0<I, O> extends ParserBuilder<I, List<O>> {
  static const _template = '''
final {{list}} = <{{O}}>[];
final {{log}} = state.log;
state.log = false;
while (true) {
  {{var1}}
  {{p1}}
  if (!state.ok) {
    break;
  }
  {{list}}.add({{val1}});
}
state.log = {{log}};
state.ok = true;
if (state.ok) {
  {{res0}} = {{list}};
}''';

  static const _templateFast = '''
final {{log}} = state.log;
state.log = false;
while (true) {
  {{p1}}
  if (!state.ok) {
    break;
  }
}
state.log = {{log}};
state.ok = true;''';

  final ParserBuilder<I, O> parser;

  const Many0(this.parser);

  @override
  String build(Context context, ParserResult? result) {
    final fast = result == null;
    final values = context.allocateLocals(['list', 'log']);
    final r1 = context.getResult(parser, !fast);
    values.addAll({
      'O': '$O',
      'p1': parser.build(context, r1),
    });
    return render2(fast, _templateFast, _template, values, [result, r1]);
  }
}

An updated version of this section will be added later...

Performance #

Current performance of the generated JSON parser.

The performance is about 1.17-1.33 times lower than that of a hand-written high-quality specialized state machine based JSON parser from the Dart SDK.

Better results in many cases are obtained in AOT mode. If the Dart SDK compiler had made more efficient use placement of (short lifetime) local variables in registers, the results could have been slightly better. At the moment, the generated parser code is not optimized for using machine registers, because performance tests, unfortunately, do not show a gain from this kind of optimization.

AOT mode:

Parse 10 times: E:\prj\test_json\bin\data\canada.json
Dart SDK JSON    : k: 2.06, 41.70 MB/s, 514.80 ms (100.00%),
Simple JSON NEW 2: k: 1.00, 86.01 MB/s, 249.60 ms (48.48%),

Parse 10 times: E:\prj\test_json\bin\data\citm_catalog.json
Dart SDK JSON    : k: 1.00, 87.98 MB/s, 187.20 ms (85.71%),
Simple JSON NEW 2: k: 1.17, 75.41 MB/s, 218.40 ms (100.00%),

Parse 10 times: E:\prj\test_json\bin\data\twitter.json
Dart SDK JSON    : k: 1.00, 57.86 MB/s, 93.60 ms (85.72%),
Simple JSON NEW 2: k: 1.17, 49.60 MB/s, 109.20 ms (100.00%),

OS: Microsoft Windows 7 Ultimate 6.1.7601
Kernel: Windows_NT 6.1.7601
Processor (4 core) Intel(R) Core(TM) i5-3450 CPU @ 3.10GHz

JIT mode:

Parse 10 times: E:\prj\test_json\bin\data\canada.json
Dart SDK JSON    : k: 2.38, 44.39 MB/s, 483.60 ms (100.00%),
Simple JSON NEW 2: k: 1.00, 105.86 MB/s, 202.80 ms (41.94%),

Parse 10 times: E:\prj\test_json\bin\data\citm_catalog.json
Dart SDK JSON    : k: 1.00, 87.98 MB/s, 187.20 ms (75.00%),
Simple JSON NEW 2: k: 1.33, 65.99 MB/s, 249.60 ms (100.00%),

Parse 10 times: E:\prj\test_json\bin\data\twitter.json
Dart SDK JSON    : k: 1.00, 57.87 MB/s, 93.60 ms (75.00%),
Simple JSON NEW 2: k: 1.33, 43.40 MB/s, 124.80 ms (100.00%),

OS: Microsoft Windows 7 Ultimate 6.1.7601
Kernel: Windows_NT 6.1.7601
Processor (4 core) Intel(R) Core(TM) i5-3450 CPU @ 3.10GHz

The data for the test is taken from here: https://github.com/serde-rs/json-benchmark/tree/master/data

There is a reasonable explanation for this: this is a combinators of universal parsers. It will always be slower than a specialized parser written by hand.
Because the redundancy that exists in parser combinators cannot be eliminated when generating code.
The same redundancy allows you to use combinators to parse any type of data, not just text. For the same reason, they are slightly less efficient at parsing text.

But there are still advantages. This is a high development speed and quite informative error messages.

The advantages over parsers that are limited only by notation are obvious. You can implement everything that such parsers support, and everything else that you need can simply be added.
But parsers using the notation can have the advantage that they can greatly optimize the generated code.

But at the same time, nothing prevents the programmer from writing sub parsers manually for those parsing places where performance is very critical. And again, this has its share of advantages of combined parsers. Add a little bit of your own code and your parser is already much faster.
Basically, this concerns the parsing of complex structures with specific data formats (strings, numbers, and so on).
The fastest parser for them is the state machine. It can and should be created manually and/or using third-party tools. And it will be only some part of the whole parser.

If I had a lot of free time, then I could probably write a code-first generator of small state machines for quickly parsing data of various formats. For example, to write lexers or sub parsers.

More info... #

To be continued...

parser_builder 1.0.0
parser_builder: ^1.0.0 copied to clipboard

Metadata

parser_builder #

First advantage #

Second advantage #

Included parser builders #

Built-in semantic actions #

Character class (additional semantic actions for character ranges) #

Projects using parser builder #

How to start write your parser? #

Example of parser declaration #

Example of generated code #

Inlining code example #

Example of error reporting #

How to declare a parser builder #

Performance #

More info... #

← Metadata

Publisher

Metadata

License

Dependencies

More

parser_builder 1.0.0 parser_builder: ^1.0.0 copied to clipboard

Metadata

parser_builder #

First advantage #

Second advantage #

Included parser builders #

Built-in semantic actions #

Character class (additional semantic actions for character ranges) #

Projects using parser builder #

How to start write your parser? #

Example of parser declaration #

Example of generated code #

Inlining code example #

Example of error reporting #

How to declare a parser builder #

Performance #

More info... #

← Metadata

Publisher

Metadata

License

Dependencies

More

parser_builder 1.0.0
parser_builder: ^1.0.0 copied to clipboard