tokenizer library

Functions

getTokens(String text) int
A simple implementation of tokenization similar to tiktoken, but using UTF-8 bytes as a base. This is a simplified version and won't match tiktoken exactly, but provides a reasonable approximation. For production use, you should implement or use a proper tiktoken port.