string/tokenizer_pipeline_utils library

Customizable tokenizer pipeline: ordered regex rules with keep/skip — roadmap #434.

A reusable lexer core: give it an ordered list of TokenRules and it walks the input, at each position taking the FIRST rule that matches as a prefix. Rules marked skip (whitespace, comments) advance the cursor without emitting a token. Unlike a one-off hand-rolled split/RegExp.allMatches, rule order resolves ambiguity deterministically and an unmatched position is a hard error rather than silently dropped text.

Classes

Token: A produced token: its rule type, the matched value, and the start offset into the original input.
TokenRule: One tokenizer rule: a type label, the pattern to match at the cursor, and whether matches are dropped (skip) instead of emitted.

Functions

tokenize(String input, List<TokenRule> rules) → List<Token>: Tokenizes input by trying rules in order at each cursor position; the first rule whose pattern matches as a prefix wins. Skipped rules advance without emitting. Throws FormatException (with the offset) at any position no rule matches, and treats a zero-length match as a non-match so a rule like \d* can never spin the cursor in place.

saropa_dart_utils package
documentation
string/tokenizer_pipeline_utils.dart

string/tokenizer_pipeline_utils library

Classes

Functions

saropa_dart_utils package

tokenizer_pipeline_utils library