utilities/sp_ai_simple_bpe_tokenizer library

Classes

SPAiSimpleBpeTokenizer
A class that implements a simple BPE (Byte Pair Encoding) tokenizer.

Properties

byteEncoder Map<int, String>
A function that returns a map of integers to Unicode strings. It generates a mapping from byte values to Unicode characters, including printable ASCII characters, Spanish characters, and some special characters.
final
pat RegExp
A regular expression pattern that matches text tokens in natural language. The pattern matches words with common English contractions (e.g. "it's", "they're"), words composed of letters, words composed of numbers, words composed of special characters (excluding whitespace), and whitespace characters.
final

Functions

bytesToUnicode() Map<int, String>
A function that returns a map of integers to Unicode strings.
dictZip(List x, List y) Map
A function that returns a map of Unicode strings to integers.
range(int x, int y) List<int>
A function that generates a list of integers between x (inclusive) and y (exclusive) using the List.generate() method. It returns the generated list.