tokenizeKeyphrases function - keyphrase_utils library

saropa_dart_utils package
documentation
string/keyphrase_utils.dart
tokenizeKeyphrases function

tokenizeKeyphrases function

List<String> tokenizeKeyphrases(

String text

)

Splits text into lowercase alphanumeric tokens, dropping stopwords.

Punctuation and casing are normalized away so 'Cats, cats!' yields a single repeated term. One-character tokens are dropped as noise.

Example:

tokenizeKeyphrases('The quick Brown fox'); // ['quick', 'brown', 'fox']

Audited: 2026-06-12 11:26 EDT

Implementation

List<String> tokenizeKeyphrases(String text) => text
    .toLowerCase()
    .split(RegExp(r'[^a-z0-9]+'))
    .where((String t) => t.length > 1 && !_kStopwords.contains(t))
    .toList();