utilities/sp_ai_simple_bpe_tokenizer library


A class that implements a simple BPE (Byte Pair Encoding) tokenizer.


byteEncoder Map<int, String>
A function that returns a map of integers to Unicode strings. It generates a mapping from byte values to Unicode characters, including printable ASCII characters, Spanish characters, and some special characters.
pat RegExp
A regular expression pattern that matches text tokens in natural language. The pattern matches words with common English contractions (e.g. "it's", "they're"), words composed of letters, words composed of numbers, words composed of special characters (excluding whitespace), and whitespace characters.


bytesToUnicode() Map<int, String>
A function that returns a map of integers to Unicode strings.
dictZip(List x, List y) Map
A function that returns a map of Unicode strings to integers.
range(int x, int y) List<int>
A function that generates a list of integers between x (inclusive) and y (exclusive) using the List.generate() method. It returns the generated list.