SimpleWordTokenizer class

Tokenizes text using rules inspired by the NLTK word tokenizer.

Applies the same normalization steps as the .NET implementation: stripping skip markers, replacing HTML entities, inserting spaces around punctuation, and splitting on whitespace.

Constructors

SimpleWordTokenizer()

Properties

hashCode int
The hash code for this object.
no setterinherited
runtimeType Type
A representation of the runtime type of the object.
no setterinherited

Methods

noSuchMethod(Invocation invocation) → dynamic
Invoked when a nonexistent method or property is accessed.
inherited
toString() String
A string representation of this object.
inherited

Operators

operator ==(Object other) bool
The equality operator.
inherited

Static Methods

wordTokenize(String text) List<String>
Tokenizes text into a list of lower-cased word tokens.