tokenize function

List<Token> tokenize(
  1. String input,
  2. List<TokenRule> rules
)

Tokenizes input by trying rules in order at each cursor position; the first rule whose pattern matches as a prefix wins. Skipped rules advance without emitting. Throws FormatException (with the offset) at any position no rule matches, and treats a zero-length match as a non-match so a rule like \d* can never spin the cursor in place.

Example:

tokenize('ab = 12', [
  TokenRule('ws', RegExp(r'\s+'), shouldSkip: true),
  TokenRule('id', RegExp(r'[a-z]+')),
  TokenRule('op', RegExp(r'=')),
  TokenRule('num', RegExp(r'\d+')),
]); // id "ab", op "=", num "12"

Audited: 2026-06-12 11:26 EDT

Implementation

List<Token> tokenize(String input, List<TokenRule> rules) {
  final List<Token> tokens = <Token>[];
  int pos = 0;
  while (pos < input.length) {
    final _Hit? hit = _firstMatch(input, pos, rules);
    if (hit == null) {
      throw FormatException('No token rule matched', input, pos);
    }
    if (!hit.rule.shouldSkip) tokens.add(Token(hit.rule.type, hit.text, pos));
    pos += hit.text.length;
  }
  return tokens;
}