token_counter 1.0.0
token_counter: ^1.0.0 copied to clipboard
Token counter for popular LLMs (OpenAI GPT, Anthropic Claude, Google Gemini, Meta Llama) with multi-language support. Pure Dart, works on all Flutter platforms and Dart VM.
1.0.0 #
First stable release.
The public API surface is now considered stable and follows semantic versioning — breaking changes will require a 2.0 release.
- Tightened the public API: internal encoder/parser types
(
HeuristicTokenizer,TiktokenBpeEncoder,SpProtoReader,SpUnigramEncoder, etc.) are no longer exported from the public barrel. Users interact withTokenCounterand its loaders. - Added
public_member_api_docslint and dartdoc comments on every public member. - Added accuracy benchmark (
benchmark/heuristic_accuracy.dart) with 12 representative inputs across 4 tokenizer families. Mean absolute error: GPT-4o ~15 %, GPT-4 ~28 %, Claude ~18 %, Gemini ~24 %. - README updated with full coverage of v0.2–v0.5 features and accuracy table.
- CI now uses
subosito/flutter-actionsoexample/resolves correctly.
0.5.0 #
LlmModelgainscontextWindowandmaxOutputTokensproperties for all 24 supported models.- New
TokenCounter.fitsInContext(text)— returnstruewhen the token count is within the model's context window. - New
TokenCounter.remainingContextTokens(text, {reserveForOutput})— how many input tokens remain after placingtextin the context. - New
TokenCounter.truncate(text, maxTokens)— binary-search truncation that keeps token count at or belowmaxTokens. - 11 new unit tests for context-window utilities and truncation.
0.4.0 #
- New
ImageTokenEstimatorwith provider-specific formulas:- OpenAI: tile-based (512 px tiles, 170 tokens/tile + 85 base);
ImageDetail.low/high/auto - Anthropic Claude:
width × height / 750 - Google Gemini: flat 258 tokens per image
- OpenAI: tile-based (512 px tiles, 170 tokens/tile + 85 base);
- New
ToolTokenEstimator.estimatefor function/tool definition token overhead. ChatMessagenow accepts animageslist ofImageAttachmentvalues.TokenCounter.countMessagesautomatically adds image token costs per provider.ImageAttachment.flatTokensoverride for when pixel dimensions are unknown.- 14 new unit tests for image formulas, tool estimation, and message counting.
0.3.0 #
- Added SentencePiece unigram-LM tokenizer in pure Dart.
- New
SpProtoReader— minimal protobuf decoder for.modelfiles (no generated code, no native dependencies). - New
SpUnigramEncoder— Viterbi forward-pass segmentation. - New
SpVocabLoaderabstract class +BytesSpVocabLoaderimplementation. - New
TokenCounter.loadSpVocab(SpVocabLoader)— switches the counter to exact SentencePiece mode forgemini,llama, andclaudefamilies. TokenCounter.isExactnow returnstruefor both tiktoken BPE and SentencePiece modes.- 8 new unit tests (proto parsing, Viterbi segmentation,
loadSpVocabAPI).
0.2.0 #
- Added exact tiktoken BPE encoder for
cl100k_base(GPT-4, GPT-3.5-turbo) ando200k_base(GPT-4o, GPT-4.1, o-series) vocabulary families. - New
TokenCounter.loadVocab(TiktokenVocabLoader)— supply the raw.tiktokenfile bytes from any source (Flutter assets, local file, in-memory bytes) to switch the counter into exact mode. - New
TiktokenVocabLoaderabstract class +BytesVocabLoaderimplementation. - New
TiktokenVocabParserfor parsing the.tiktokenline format. - New
TiktokenSpecialTokensconstants forcl100k_baseando200k_base. TokenCounter.isExactproperty to distinguish heuristic from BPE mode.- 13 new unit tests for the BPE algorithm (merge correctness, special tokens,
pre-tokenization,
loadVocabAPI).
0.1.1 #
- Rewrote README in English with accurate, runnable code examples.
- Updated roadmap to mark v0.1 as complete.
0.1.0 #
Initial public release.
- Pure-Dart heuristic token estimator for OpenAI (GPT-4o / GPT-4 / o-series), Anthropic Claude 3–4, Google Gemini 1.5 / 2, and Meta Llama 3 / 3.1 / 3.3.
- Unicode script classifier covering Latin, CJK, Hiragana, Katakana, Hangul, Arabic, Cyrillic, Devanagari, Thai, emoji, and more.
TokenCounter.estimate,TokenCounter.forModel,countMessages, andestimateCostpublic API.- Bundled per-model pricing table for cost estimation.
- 22 unit tests covering multilingual inputs, per-provider chat overhead, and cost math.