Tiktoken class
Tiktoken encoder/decoder.
It exposes APIs for processing text using tokens. Can be extended t o support new encodings.
Example:
// 1. Get the base encoder
final baseEnc = getEncoding("cl100kBase");
// 2. Create custom encoder by extending baseEnc
final enc = Tiktoken(
name: "cl100k_im",
patStr: cl100kBase.patStr,
mergeableRanks: cl100kBase.mergeableRanks,
specialTokens: {
...cl100kBase.specialTokens,
// 3. We provide additional tokens
"<|im_start|>": 100264,
"<|im_end|>": 100265,
},
);
Constructors
Properties
- eotToken → int?
-
no setter
- explicitNVocab → int?
-
The number of tokens in the vocabulary.
If provided, it is checked that the number of mergeable tokens
and special tokens is equal to this number.
final
- hashCode → int
-
The hash code for this object.
no setterinherited
- maxTokenValue ↔ int
-
latefinal
-
mergeableRanks
→ Map<
ByteArray, int> -
A dictionary mapping mergeable token bytes to their ranks.
The ranks must correspond to merge priority.
final
- name → String
-
The name of the encoding.
final
- patStr → String
-
A regex pattern string that is used to split the input text.
final
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
-
specialTokens
→ Map<
String, int> -
A dictionary mapping special token strings to their token values.
final
-
specialTokensSet
↔ Set<
String> -
A set of special tokens keys
latefinal
Methods
-
decode(
List< int> tokens, {bool allowMalformed = true}) → String - Decodes a list of tokens into a string.
-
decodeBytes(
List< int> tokens) → Uint8List - Decodes a list of tokens into bytes.
-
decodeSingleTokenBytes(
int token) → Uint8List - Decodes a token into bytes.
-
decodeTokenBytes(
List< int> tokens) → List<Uint8List> - Decodes a list of tokens into a list of bytes.
-
encode(
String text, {SpecialTokensSet allowedSpecial = const SpecialTokensSet.empty(), SpecialTokensSet disallowedSpecial = const SpecialTokensSet.all()}) → Uint32List - Encodes a string into tokens.
-
encodeOrdinary(
String text) → Uint32List - Encodes a string into tokens, ignoring special tokens.
-
encodeSingleToken(
List< int> bytes) → int - Encodes text corresponding to a single token to its token value.
-
encodeWithUnstable(
String text, {SpecialTokensSet allowedSpecial = const SpecialTokensSet.empty(), SpecialTokensSet disallowedSpecial = const SpecialTokensSet.all()}) → Tuple2< List< int> , Set<List< >int> > - Encodes a string into stable tokens and possible completion sequences.
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
tokenByteValues(
) → List< Uint8List> -
Returns
sortedTokenBytes
from underlyingCoreBPE
tokenizer. -
toString(
) → String -
A string representation of this object.
inherited
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited