tokenizer_parser 0.1.3 copy "tokenizer_parser: ^0.1.3" to clipboard
tokenizer_parser: ^0.1.3 copied to clipboard

A lightweight tokenizer and parser-composition toolkit for Dart, designed to build custom language grammars and structured token streams.

tokenizer_parser #

tokenizer_parser is a lightweight tokenizer and parser-composition toolkit for Dart. It helps you define literal token patterns and compose them into higher-level grammar nodes (for example: fields, declarations, or AST-like structures) without code generation.

This package is useful when you need:

  • custom DSL parsing,
  • structured token streams from plain text,
  • deterministic composition of flat tokens into nested tokens.

Features #

  • Regex-based literal token matching with line/column/index tracking.
  • Non-literal composition using sequence and alternatives.
  • Optional ignore list for tokens such as whitespace and comments.
  • File and string entry points:
    • Tokenizer.tokenize(...)
    • Tokenizer.tokenizeFile(...)
  • Public, composable model primitives (LiteralModel, NonLiteralModel, TokenSequence, TokenAlternatives).

Installation #

Add the package to your pubspec.yaml:

dependencies:
  tokenizer_parser: ^0.1.1

Then install dependencies:

dart pub get

Quick Start #

import 'package:tokenizer_parser/tokenizer_parser.dart';

const identifier = LiteralModel(name: 'identifier', pattern: r'[A-Za-z_]+');
const whitespace = LiteralModel(name: 'whitespace', pattern: r'\s+');
const equals = LiteralModel(name: 'equals', pattern: r'=');

const assignment = NonLiteralModel(
  name: 'assignment',
  sequence: TokenSequence('identifier-equals-identifier', [
    identifier,
    equals,
    identifier,
  ]),
);

final language = <TokenModel>[identifier, whitespace, equals, assignment];

void main() {
  final result = Tokenizer.tokenize('name = value', language, [whitespace]);
  final tokens = result.$1;
  final remaining = result.$2;

  print(tokens);
  print(remaining); // Unmatched input segments, if any.
}

Core Concepts #

1) LiteralModel #

Matches direct text with a regex pattern.

const number = LiteralModel(name: 'number', pattern: r'\d+');

2) NonLiteralModel #

Builds higher-level tokens from existing tokens.

const pair = NonLiteralModel(
  name: 'pair',
  sequence: TokenSequence('key-colon-value', [key, colon, value]),
);

3) TokenSequence #

Requires all elements to match in order.

4) TokenAlternatives #

Matches the first successful alternative.

Tokenizing Files #

final result = Tokenizer.tokenizeFile('example/input.gql', language, [whitespace]);

Return Value #

Both Tokenizer.tokenize and Tokenizer.tokenizeFile return:

  • $1: List<Token> created tokens.
  • $2: List<Input> unmatched input segments.

This makes it easy to detect parse gaps or unsupported syntax.

Example Project #

See the full GraphQL-like grammar example in:

  • example/ql_lang.dart
  • example/tokenizer_example.dart

Contributing #

Issues and pull requests are welcome. If you add grammar features, include tests that validate both matched tokens and unmatched remainder behavior.

2
likes
160
points
43
downloads

Documentation

API reference

Publisher

unverified uploader

Weekly Downloads

A lightweight tokenizer and parser-composition toolkit for Dart, designed to build custom language grammars and structured token streams.

Repository (GitHub)
View/report issues

License

MIT (license)

More

Packages that depend on tokenizer_parser