Qwen2Tokenizer class

Inheritance

Constructors

Qwen2Tokenizer(Map<String, dynamic> tokenizerJSON, Map<String, dynamic> tokenizerConfig)

Properties

added_tokens List<AddedToken>
getter/setter pairinherited
added_tokens_map Map<String, AddedToken>
getter/setter pairinherited
added_tokens_splitter ↔ DictionarySplitter
getter/setter pairinherited
additional_special_tokens List<String>
getter/setter pairinherited
all_special_ids List<num>
getter/setter pairinherited
bos_token String?
getter/setter pairinherited
bos_token_id int?
getter/setter pairinherited
chat_template ↔ dynamic
getter/setter pairinherited
clean_up_tokenization_spaces bool
getter/setter pairinherited
decoder Decoder?
getter/setter pairinherited
do_lowercase_and_remove_accent bool
getter/setter pairinherited
eos_token String?
getter/setter pairinherited
eos_token_id int?
getter/setter pairinherited
hashCode int
The hash code for this object.
no setterinherited
legacy bool
getter/setter pairinherited
mask_token String?
getter/setter pairinherited
mask_token_id int?
getter/setter pairinherited
model TokenizerModel
getter/setter pairinherited
model_max_length num
getter/setter pairinherited
normalizer Normalizer?
getter/setter pairinherited
pad_token String?
getter/setter pairinherited
pad_token_id int?
getter/setter pairinherited
padding_side String
getter/setter pairinherited
post_processor PostProcessor?
getter/setter pairinherited
pre_tokenizer PreTokenizer?
getter/setter pairinherited
remove_space bool?
Whether or not to strip the text when tokenizing (removing excess spaces before and after the string).
getter/setter pairinherited
return_token_type_ids bool
getter/setter pairinherited
runtimeType Type
A representation of the runtime type of the object.
no setterinherited
sep_token String?
getter/setter pairinherited
sep_token_id int?
getter/setter pairinherited
special_tokens List<String>
getter/setter pairinherited
unk_token String?
getter/setter pairinherited
unk_token_id int?
getter/setter pairinherited

Methods

apply_chat_template(List<Message> conversation, [ApplyChatTemplateOptions? options]) Future
Converts a list of message objects with "role" and "content" keys to a list of token ids. This method is intended for use with chat models, and will read the tokenizer's chat_template attribute to determine the format and control tokens to use when converting.
inherited
batch_decode(dynamic batch, {bool skip_special_tokens = false, bool clean_up_tokenization_spaces = true}) List<String>
Decode a batch of tokenized sequences. @param {number[][]|Tensor} batch List/Tensor of tokenized input sequences. @param {Object} decode_args (Optional) Object with decoding arguments. @returns {string[]} List of decoded sequences.
inherited
call(dynamic text, {dynamic text_pair, bool padding = false, bool add_special_tokens = true, bool? truncation, num? max_length, bool return_tensor = true, bool? return_token_type_ids}) Future<BatchEncoding>
Encode/tokenize the given text(s). @param {string|string[]} text The text to tokenize. @param {Object} options An optional object containing the following properties: @param {string|string[]} options.text_pair=null Optional second sequence to be encoded. If set, must be the same type as text. @param {boolean|'max_length'} options.padding=false Whether to pad the input sequences. @param {boolean} options.add_special_tokens=true Whether or not to add the special tokens associated with the corresponding model. @param {boolean} options.truncation=null Whether to truncate the input sequences. @param {number} options.max_length=null Maximum length of the returned list and optionally padding length. @param {boolean} options.return_tensor=true Whether to return the results as Tensors or arrays. @param {boolean} options.return_token_type_ids=null Whether to return the token type ids. @returns {BatchEncoding} Object to be passed to the model.
inherited
decode(dynamic token_ids, {bool? skip_special_tokens, bool? clean_up_tokenization_spaces}) String
Decodes a sequence of token IDs back to a string.
inherited
decode_single(List<int> token_ids, {bool skip_special_tokens = false, bool? clean_up_tokenization_spaces}) String
Decode a single list of token ids to a string. @param {number[]|bigint[]} token_ids List of token ids to decode @param {Object} decode_args Optional arguments for decoding @param {boolean} decode_args.skip_special_tokens=false Whether to skip special tokens during decoding @param {boolean} decode_args.clean_up_tokenization_spaces=null Whether to clean up tokenization spaces during decoding. If null, the value is set to this.decoder.cleanup if it exists, falling back to this.clean_up_tokenization_spaces if it exists, falling back to true. @returns {string} The decoded string
inherited
encode(String text, {String? text_pair, bool add_special_tokens = true, bool? return_token_type_ids}) List<num>
Encodes a single text or a pair of texts using the model's tokenizer.
inherited
get_chat_template({String? chat_template, List<Map<String, dynamic>>? tools}) String
Retrieve the chat template string used for tokenizing chat messages. This template is used internally by the apply_chat_template method and can also be used externally to retrieve the model's chat template for better generation tracking.
inherited
getToken(List<String> keys) String?
Returns the value of the first matching key in the tokenizer config object. @param {...string} keys One or more keys to search for in the tokenizer config object. @returns {string|null} The value associated with the first matching key, or null if no match is found. @throws {Error} If an object is found for a matching key and its __type property is not "AddedToken". @private
inherited
noSuchMethod(Invocation invocation) → dynamic
Invoked when a nonexistent method or property is accessed.
inherited
tokenize(String text, {String? pair, bool add_special_tokens = false}) List<String>
Converts a string into a sequence of tokens. @param {string} text The sequence to be encoded. @param {Object} options An optional object containing the following properties: @param {string} options.pair A second sequence to be encoded with the first. @param {boolean} options.add_special_tokens=false Whether or not to add the special tokens associated with the corresponding model. @returns {string[]} The list of tokens.
inherited
toString() String
A string representation of this object.
inherited

Operators

operator ==(Object other) bool
The equality operator.
inherited