Tiktoken class

Tiktoken encoder/decoder.

It exposes APIs for processing text using tokens. Can be extended t o support new encodings.

Example:

// 1. Get the base encoder
final baseEnc = getEncoding("cl100kBase");

// 2. Create custom encoder by extending baseEnc
final enc = Tiktoken(
  name: "cl100k_im",
  patStr: cl100kBase.patStr,
  mergeableRanks: cl100kBase.mergeableRanks,
  specialTokens: {
    ...cl100kBase.specialTokens,
    // 3. We provide additional tokens
    "<|im_start|>": 100264,
    "<|im_end|>": 100265,
  },
);

Constructors

Tiktoken({required String name, required String patStr, required Map<ByteArray, int> mergeableRanks, required Map<String, int> specialTokens, int? explicitNVocab})

Properties

eotToken int?
no setter
explicitNVocab int?
The number of tokens in the vocabulary. If provided, it is checked that the number of mergeable tokens and special tokens is equal to this number.
final
hashCode int
The hash code for this object.
no setterinherited
maxTokenValue int
latefinal
mergeableRanks Map<ByteArray, int>
A dictionary mapping mergeable token bytes to their ranks. The ranks must correspond to merge priority.
final
name String
The name of the encoding.
final
patStr String
A regex pattern string that is used to split the input text.
final
runtimeType Type
A representation of the runtime type of the object.
no setterinherited
specialTokens Map<String, int>
A dictionary mapping special token strings to their token values.
final
specialTokensSet Set<String>
A set of special tokens keys
latefinal

Methods

decode(List<int> tokens, {bool allowMalformed = true}) String
Decodes a list of tokens into a string.
decodeBytes(List<int> tokens) Uint8List
Decodes a list of tokens into bytes.
decodeSingleTokenBytes(int token) Uint8List
Decodes a token into bytes.
decodeTokenBytes(List<int> tokens) List<Uint8List>
Decodes a list of tokens into a list of bytes.
encode(String text, {SpecialTokensSet allowedSpecial = const SpecialTokensSet.empty(), SpecialTokensSet disallowedSpecial = const SpecialTokensSet.all()}) Uint32List
Encodes a string into tokens.
encodeOrdinary(String text) Uint32List
Encodes a string into tokens, ignoring special tokens.
encodeSingleToken(List<int> bytes) int
Encodes text corresponding to a single token to its token value.
encodeWithUnstable(String text, {SpecialTokensSet allowedSpecial = const SpecialTokensSet.empty(), SpecialTokensSet disallowedSpecial = const SpecialTokensSet.all()}) → Tuple2<List<int>, Set<List<int>>>
Encodes a string into stable tokens and possible completion sequences.
noSuchMethod(Invocation invocation) → dynamic
Invoked when a nonexistent method or property is accessed.
inherited
tokenByteValues() List<Uint8List>
Returns sortedTokenBytes from underlying CoreBPE tokenizer.
toString() String
A string representation of this object.
inherited

Operators

operator ==(Object other) bool
The equality operator.
inherited