tokenizer_parser
tokenizer_parser is a lightweight tokenizer and parser-composition toolkit
for Dart. It helps you define literal token patterns and compose them into
higher-level grammar nodes (for example: fields, declarations, or AST-like
structures) without code generation.
This package is useful when you need:
- custom DSL parsing,
- structured token streams from plain text,
- deterministic composition of flat tokens into nested tokens.
Features
- Regex-based literal token matching with line/column/index tracking.
- Non-literal composition using sequence and alternatives.
- Optional ignore list for tokens such as whitespace and comments.
- File and string entry points:
Tokenizer.tokenize(...)Tokenizer.tokenizeFile(...)
- Public, composable model primitives (
LiteralModel,NonLiteralModel,TokenSequence,TokenAlternatives).
Installation
Add the package to your pubspec.yaml:
dependencies:
tokenizer_parser: ^0.1.1
Then install dependencies:
dart pub get
Quick Start
import 'package:tokenizer_parser/tokenizer_parser.dart';
const identifier = LiteralModel(name: 'identifier', pattern: r'[A-Za-z_]+');
const whitespace = LiteralModel(name: 'whitespace', pattern: r'\s+');
const equals = LiteralModel(name: 'equals', pattern: r'=');
const assignment = NonLiteralModel(
name: 'assignment',
sequence: TokenSequence('identifier-equals-identifier', [
identifier,
equals,
identifier,
]),
);
final language = <TokenModel>[identifier, whitespace, equals, assignment];
void main() {
final result = Tokenizer.tokenize('name = value', language, [whitespace]);
final tokens = result.$1;
final remaining = result.$2;
print(tokens);
print(remaining); // Unmatched input segments, if any.
}
Core Concepts
1) LiteralModel
Matches direct text with a regex pattern.
const number = LiteralModel(name: 'number', pattern: r'\d+');
2) NonLiteralModel
Builds higher-level tokens from existing tokens.
const pair = NonLiteralModel(
name: 'pair',
sequence: TokenSequence('key-colon-value', [key, colon, value]),
);
3) TokenSequence
Requires all elements to match in order.
4) TokenAlternatives
Matches the first successful alternative.
Tokenizing Files
final result = Tokenizer.tokenizeFile('example/input.gql', language, [whitespace]);
Return Value
Both Tokenizer.tokenize and Tokenizer.tokenizeFile return:
$1:List<Token>created tokens.$2:List<Input>unmatched input segments.
This makes it easy to detect parse gaps or unsupported syntax.
Example Project
See the full GraphQL-like grammar example in:
example/ql_lang.dartexample/tokenizer_example.dart
Contributing
Issues and pull requests are welcome. If you add grammar features, include tests that validate both matched tokens and unmatched remainder behavior.
Libraries
- tokenizer_parser
- Tokenizer Parser public API.