utilities/sp_ai_simple_bpe_tokenizer
library
Properties
-
byteEncoder
→ Map<int, String>
-
A function that returns a map of integers to Unicode strings.
It generates a mapping from byte values to Unicode characters, including printable ASCII characters, Spanish characters, and some special characters.
final
-
pat
→ RegExp
-
A regular expression pattern that matches text tokens in natural language.
The pattern matches words with common English contractions (e.g. "it's", "they're"),
words composed of letters, words composed of numbers, words composed of special characters (excluding whitespace), and whitespace characters.
final
Functions
-
bytesToUnicode()
→ Map<int, String>
-
A function that returns a map of integers to Unicode strings.
-
dictZip(List x, List y)
→ Map
-
A function that returns a map of Unicode strings to integers.
-
range(int x, int y)
→ List<int>
-
A function that generates a list of integers between x (inclusive) and y (exclusive) using the List.generate() method. It returns the generated list.