token_counter 0.3.0
token_counter: ^0.3.0 copied to clipboard
Token counter for popular LLMs (OpenAI GPT, Anthropic Claude, Google Gemini, Meta Llama) with multi-language support. Pure Dart, works on all Flutter platforms and Dart VM.
0.3.0 #
- Added SentencePiece unigram-LM tokenizer in pure Dart.
- New
SpProtoReader— minimal protobuf decoder for.modelfiles (no generated code, no native dependencies). - New
SpUnigramEncoder— Viterbi forward-pass segmentation. - New
SpVocabLoaderabstract class +BytesSpVocabLoaderimplementation. - New
TokenCounter.loadSpVocab(SpVocabLoader)— switches the counter to exact SentencePiece mode forgemini,llama, andclaudefamilies. TokenCounter.isExactnow returnstruefor both tiktoken BPE and SentencePiece modes.- 8 new unit tests (proto parsing, Viterbi segmentation,
loadSpVocabAPI).
0.2.0 #
- Added exact tiktoken BPE encoder for
cl100k_base(GPT-4, GPT-3.5-turbo) ando200k_base(GPT-4o, GPT-4.1, o-series) vocabulary families. - New
TokenCounter.loadVocab(TiktokenVocabLoader)— supply the raw.tiktokenfile bytes from any source (Flutter assets, local file, in-memory bytes) to switch the counter into exact mode. - New
TiktokenVocabLoaderabstract class +BytesVocabLoaderimplementation. - New
TiktokenVocabParserfor parsing the.tiktokenline format. - New
TiktokenSpecialTokensconstants forcl100k_baseando200k_base. TokenCounter.isExactproperty to distinguish heuristic from BPE mode.- 13 new unit tests for the BPE algorithm (merge correctness, special tokens,
pre-tokenization,
loadVocabAPI).
0.1.1 #
- Rewrote README in English with accurate, runnable code examples.
- Updated roadmap to mark v0.1 as complete.
0.1.0 #
Initial public release.
- Pure-Dart heuristic token estimator for OpenAI (GPT-4o / GPT-4 / o-series), Anthropic Claude 3–4, Google Gemini 1.5 / 2, and Meta Llama 3 / 3.1 / 3.3.
- Unicode script classifier covering Latin, CJK, Hiragana, Katakana, Hangul, Arabic, Cyrillic, Devanagari, Thai, emoji, and more.
TokenCounter.estimate,TokenCounter.forModel,countMessages, andestimateCostpublic API.- Bundled per-model pricing table for cost estimation.
- 22 unit tests covering multilingual inputs, per-provider chat overhead, and cost math.