TiktokenEncoder class
Low level Tiktoken encoder/decoder. It exposes more detailed APIs for processing text using tokens.
Constructors
-
TiktokenEncoder({required String name, required String patternStr, required Map<
ByteArray, int> mergeableRanks, required Map<String, int> specialTokens, int? explicitNVocab}) - Instead of using this constructor, consider using the static helper functions Tiktoken.getEncoder and Tiktoken.getEncoderForModel.
Properties
- eotToken → int?
-
no setter
- explicitNVocab → int?
-
The number of tokens in the vocabulary.
If provided, it is checked that the number of mergeable tokens
and special tokens is equal to this number.
final
- hashCode → int
-
The hash code for this object.
no setterinherited
- maxTokenValue ↔ int
-
latefinal
-
mergeableRanks
→ Map<
ByteArray, int> -
A dictionary mapping mergeable token bytes to their ranks.
The ranks must correspond to merge priority.
final
- name → String
-
The name of the encoding.
final
- patternStr → String
-
A regex pattern string that is used to split the input text.
final
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
-
specialTokens
→ Map<
String, int> -
A dictionary mapping special token strings to their token values.
final
-
specialTokensSet
↔ Set<
String> -
A set of special tokens keys
latefinal
Methods
-
decode(
List< int> tokens, {bool allowMalformed = true}) → String - Decodes a list of tokens into a string.
-
decodeBytes(
List< int> tokens) → Uint8List - Decodes a list of tokens into bytes. Example:
-
decodeSingleTokenBytes(
int token) → Uint8List - Decodes a token into bytes.
-
decodeTokenBytes(
List< int> tokens) → List<Uint8List> - Decodes a list of tokens into a list of bytes.
-
encode(
String text, {SpecialTokensSet allowedSpecial = const SpecialTokensSet.empty(), SpecialTokensSet disallowedSpecial = const SpecialTokensSet.all()}) → Uint32List - Encodes a string into tokens.
-
encodeOrdinary(
String text) → Uint32List - Encodes a string into tokens, ignoring special tokens.
-
encodeSingleToken(
List< int> bytes) → int - Encodes text corresponding to a single token to its token value.
-
encodeWithUnstable(
String text, {SpecialTokensSet allowedSpecial = const SpecialTokensSet.empty(), SpecialTokensSet disallowedSpecial = const SpecialTokensSet.all()}) → (List< int> , Set<List< )int> > - Encodes a string into stable tokens and possible completion sequences.
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
tokenByteValues(
) → List< Uint8List> -
Returns
sortedTokenBytes
from underlyingCoreBPE
tokenizer. -
toString(
) → String -
A string representation of this object.
inherited
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited