sp_ai_simple_bpe_tokenizer library
Classes
- BpeTokenizerCleared
- The state of the BpeTokenizerCubit when it has successfully cleared.
- BpeTokenizerClearing
- The state of the BpeTokenizerCubit when it is clearing.
- BpeTokenizerCubit
- This class is used to manage the state of the BpeTokenizerCubit.
- BpeTokenizerError
- The state of the BpeTokenizerCubit when it has encountered an error. Prints the error to the console in debug mode.
- BpeTokenizerInitial
- The initial state of the BpeTokenizerCubit.
- BpeTokenizerLoaded
- The state of the BpeTokenizerCubit when it has successfully loaded.
- BpeTokenizerLoading
- The state of the BpeTokenizerCubit when it is loading.
- BpeTokenizerState
- This class is used to manage the state of the BpeTokenizerCubit.
- MainBpeTokenizerState
- This class is used to manage the state of the BpeTokenizerCubit.
- SPAiSimpleBpeTokenizer
- A class that implements a simple BPE (Byte Pair Encoding) tokenizer.
- SPTokenContainer
- A class that represents a container for the tokens, token count, and character count of a text.
Properties
-
byteEncoder
→ Map<
int, String> -
A function that returns a map of integers to Unicode strings.
It generates a mapping from byte values to Unicode characters, including printable ASCII characters, Spanish characters, and some special characters.
final
- pat → RegExp
-
A regular expression pattern that matches text tokens in natural language.
The pattern matches words with common English contractions (e.g. "it's", "they're"),
words composed of letters, words composed of numbers, words composed of special characters (excluding whitespace), and whitespace characters.
final
Functions
-
bytesToUnicode(
) → Map< int, String> - A function that returns a map of integers to Unicode strings.
-
dictZip(
List x, List y) → Map - A function that returns a map of Unicode strings to integers.
-
range(
int x, int y) → List< int> - A function that generates a list of integers between x (inclusive) and y (exclusive) using the List.generate() method. It returns the generated list.