sp_ai_simple_bpe_tokenizer library

Classes

BpeTokenizerCleared
The state of the BpeTokenizerCubit when it has successfully cleared.
BpeTokenizerClearing
The state of the BpeTokenizerCubit when it is clearing.
BpeTokenizerCubit
This class is used to manage the state of the BpeTokenizerCubit.
BpeTokenizerError
The state of the BpeTokenizerCubit when it has encountered an error. Prints the error to the console in debug mode.
BpeTokenizerInitial
The initial state of the BpeTokenizerCubit.
BpeTokenizerLoaded
The state of the BpeTokenizerCubit when it has successfully loaded.
BpeTokenizerLoading
The state of the BpeTokenizerCubit when it is loading.
BpeTokenizerState
This class is used to manage the state of the BpeTokenizerCubit.
MainBpeTokenizerState
This class is used to manage the state of the BpeTokenizerCubit.
SPAiSimpleBpeTokenizer
A class that implements a simple BPE (Byte Pair Encoding) tokenizer.
SPTokenContainer
A class that represents a container for the tokens, token count, and character count of a text.

Properties

byteEncoder Map<int, String>
A function that returns a map of integers to Unicode strings. It generates a mapping from byte values to Unicode characters, including printable ASCII characters, Spanish characters, and some special characters.
final
pat RegExp
A regular expression pattern that matches text tokens in natural language. The pattern matches words with common English contractions (e.g. "it's", "they're"), words composed of letters, words composed of numbers, words composed of special characters (excluding whitespace), and whitespace characters.
final

Functions

bytesToUnicode() Map<int, String>
A function that returns a map of integers to Unicode strings.
dictZip(List x, List y) Map
A function that returns a map of Unicode strings to integers.
range(int x, int y) List<int>
A function that generates a list of integers between x (inclusive) and y (exclusive) using the List.generate() method. It returns the generated list.