tokenize function - tokenizer_pipeline_utils library

Tokenizes input by trying rules in order at each cursor position; the first rule whose pattern matches as a prefix wins. Skipped rules advance without emitting. Throws FormatException (with the offset) at any position no rule matches, and treats a zero-length match as a non-match so a rule like \d* can never spin the cursor in place.

Example:

tokenize('ab = 12', [
  TokenRule('ws', RegExp(r'\s+'), shouldSkip: true),
  TokenRule('id', RegExp(r'[a-z]+')),
  TokenRule('op', RegExp(r'=')),
  TokenRule('num', RegExp(r'\d+')),
]); // id "ab", op "=", num "12"

Audited: 2026-06-12 11:26 EDT

Implementation

List<Token> tokenize(String input, List<TokenRule> rules) { final List<Token> tokens = <Token>[]; int pos = 0; while (pos < input.length) { final _Hit? hit = _firstMatch(input, pos, rules); if (hit == null) { throw FormatException('No token rule matched', input, pos); } if (!hit.rule.shouldSkip) tokens.add(Token(hit.rule.type, hit.text, pos)); pos += hit.text.length; } return tokens; }