tokenize function
Tokenizes input by trying rules in order at each cursor position; the
first rule whose pattern matches as a prefix wins. Skipped rules advance
without emitting. Throws FormatException (with the offset) at any position
no rule matches, and treats a zero-length match as a non-match so a rule like
\d* can never spin the cursor in place.
Example:
tokenize('ab = 12', [
TokenRule('ws', RegExp(r'\s+'), shouldSkip: true),
TokenRule('id', RegExp(r'[a-z]+')),
TokenRule('op', RegExp(r'=')),
TokenRule('num', RegExp(r'\d+')),
]); // id "ab", op "=", num "12"
Audited: 2026-06-12 11:26 EDT
Implementation
List<Token> tokenize(String input, List<TokenRule> rules) {
final List<Token> tokens = <Token>[];
int pos = 0;
while (pos < input.length) {
final _Hit? hit = _firstMatch(input, pos, rules);
if (hit == null) {
throw FormatException('No token rule matched', input, pos);
}
if (!hit.rule.shouldSkip) tokens.add(Token(hit.rule.type, hit.text, pos));
pos += hit.text.length;
}
return tokens;
}