string/tokenizer_pipeline_utils library
Customizable tokenizer pipeline: ordered regex rules with keep/skip — roadmap #434.
A reusable lexer core: give it an ordered list of TokenRules and it walks
the input, at each position taking the FIRST rule that matches as a prefix.
Rules marked skip (whitespace, comments) advance the cursor without
emitting a token. Unlike a one-off hand-rolled split/RegExp.allMatches,
rule order resolves ambiguity deterministically and an unmatched position is
a hard error rather than silently dropped text.
Classes
Functions
-
tokenize(
String input, List< TokenRule> rules) → List<Token> -
Tokenizes
inputby tryingrulesin order at each cursor position; the first rule whose pattern matches as a prefix wins. Skipped rules advance without emitting. Throws FormatException (with the offset) at any position no rule matches, and treats a zero-length match as a non-match so a rule like\d*can never spin the cursor in place.