Qwen2Tokenizer class
- Inheritance
-
- Object
- PreTrainedTokenizer
- Qwen2Tokenizer
Constructors
-
Qwen2Tokenizer(Map<
String, dynamic> tokenizerJSON, Map<String, dynamic> tokenizerConfig)
Properties
-
added_tokens
↔ List<
AddedToken> -
getter/setter pairinherited
-
added_tokens_map
↔ Map<
String, AddedToken> -
getter/setter pairinherited
- added_tokens_splitter ↔ DictionarySplitter
-
getter/setter pairinherited
-
additional_special_tokens
↔ List<
String> -
getter/setter pairinherited
-
all_special_ids
↔ List<
num> -
getter/setter pairinherited
- bos_token ↔ String?
-
getter/setter pairinherited
- bos_token_id ↔ int?
-
getter/setter pairinherited
- chat_template ↔ dynamic
-
getter/setter pairinherited
- clean_up_tokenization_spaces ↔ bool
-
getter/setter pairinherited
- decoder ↔ Decoder?
-
getter/setter pairinherited
- do_lowercase_and_remove_accent ↔ bool
-
getter/setter pairinherited
- eos_token ↔ String?
-
getter/setter pairinherited
- eos_token_id ↔ int?
-
getter/setter pairinherited
- hashCode → int
-
The hash code for this object.
no setterinherited
- legacy ↔ bool
-
getter/setter pairinherited
- mask_token ↔ String?
-
getter/setter pairinherited
- mask_token_id ↔ int?
-
getter/setter pairinherited
- model ↔ TokenizerModel
-
getter/setter pairinherited
- model_max_length ↔ num
-
getter/setter pairinherited
- normalizer ↔ Normalizer?
-
getter/setter pairinherited
- pad_token ↔ String?
-
getter/setter pairinherited
- pad_token_id ↔ int?
-
getter/setter pairinherited
- padding_side ↔ String
-
getter/setter pairinherited
- post_processor ↔ PostProcessor?
-
getter/setter pairinherited
- pre_tokenizer ↔ PreTokenizer?
-
getter/setter pairinherited
- remove_space ↔ bool?
-
Whether or not to strip the text when tokenizing (removing excess spaces before and after the string).
getter/setter pairinherited
- return_token_type_ids ↔ bool
-
getter/setter pairinherited
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
- sep_token ↔ String?
-
getter/setter pairinherited
- sep_token_id ↔ int?
-
getter/setter pairinherited
-
special_tokens
↔ List<
String> -
getter/setter pairinherited
- unk_token ↔ String?
-
getter/setter pairinherited
- unk_token_id ↔ int?
-
getter/setter pairinherited
Methods
-
apply_chat_template(
List< Message> conversation, [ApplyChatTemplateOptions? options]) → Future -
Converts a list of message objects with
"role"and"content"keys to a list of token ids. This method is intended for use with chat models, and will read the tokenizer's chat_template attribute to determine the format and control tokens to use when converting.inherited -
batch_decode(
dynamic batch, {bool skip_special_tokens = false, bool clean_up_tokenization_spaces = true}) → List< String> -
Decode a batch of tokenized sequences.
@param {number[][]|Tensor} batch List/Tensor of tokenized input sequences.
@param {Object} decode_args (Optional) Object with decoding arguments.
@returns {string[]} List of decoded sequences.
inherited
-
call(
dynamic text, {dynamic text_pair, bool padding = false, bool add_special_tokens = true, bool? truncation, num? max_length, bool return_tensor = true, bool? return_token_type_ids}) → Future< BatchEncoding> -
Encode/tokenize the given text(s).
@param {string|string[]} text The text to tokenize.
@param {Object} options An optional object containing the following properties:
@param {string|string[]}
options.text_pair=nullOptional second sequence to be encoded. If set, must be the same type as text. @param {boolean|'max_length'}options.padding=falseWhether to pad the input sequences. @param {boolean}options.add_special_tokens=trueWhether or not to add the special tokens associated with the corresponding model. @param {boolean}options.truncation=nullWhether to truncate the input sequences. @param {number}options.max_length=nullMaximum length of the returned list and optionally padding length. @param {boolean}options.return_tensor=trueWhether to return the results as Tensors or arrays. @param {boolean}options.return_token_type_ids=nullWhether to return the token type ids. @returns {BatchEncoding} Object to be passed to the model.inherited -
decode(
dynamic token_ids, {bool? skip_special_tokens, bool? clean_up_tokenization_spaces}) → String -
Decodes a sequence of token IDs back to a string.
inherited
-
decode_single(
List< int> token_ids, {bool skip_special_tokens = false, bool? clean_up_tokenization_spaces}) → String -
Decode a single list of token ids to a string.
@param {number[]|bigint[]} token_ids List of token ids to decode
@param {Object} decode_args Optional arguments for decoding
@param {boolean}
decode_args.skip_special_tokens=falseWhether to skip special tokens during decoding @param {boolean}decode_args.clean_up_tokenization_spaces=nullWhether to clean up tokenization spaces during decoding. If null, the value is set tothis.decoder.cleanupif it exists, falling back tothis.clean_up_tokenization_spacesif it exists, falling back totrue. @returns {string} The decoded stringinherited -
encode(
String text, {String? text_pair, bool add_special_tokens = true, bool? return_token_type_ids}) → List< num> -
Encodes a single text or a pair of texts using the model's tokenizer.
inherited
-
get_chat_template(
{String? chat_template, List< Map< ? tools}) → StringString, dynamic> > -
Retrieve the chat template string used for tokenizing chat messages. This template is used
internally by the
apply_chat_templatemethod and can also be used externally to retrieve the model's chat template for better generation tracking.inherited -
getToken(
List< String> keys) → String? -
Returns the value of the first matching key in the tokenizer config object.
@param {...string} keys One or more keys to search for in the tokenizer config object.
@returns {string|null} The value associated with the first matching key, or null if no match is found.
@throws {Error} If an object is found for a matching key and its __type property is not "AddedToken".
@private
inherited
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
tokenize(
String text, {String? pair, bool add_special_tokens = false}) → List< String> -
Converts a string into a sequence of tokens.
@param {string} text The sequence to be encoded.
@param {Object} options An optional object containing the following properties:
@param {string}
options.pairA second sequence to be encoded with the first. @param {boolean}options.add_special_tokens=falseWhether or not to add the special tokens associated with the corresponding model. @returns {string[]} The list of tokens.inherited -
toString(
) → String -
A string representation of this object.
inherited
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited