utils/tokens/token_counter library

Token counting and estimation utilities ported from Neomage TypeScript.

Provides approximate tokenization compatible with cl100k_base (Neomage/GPT-4), token budgets, cost estimation, and context window management.

Classes

Cl100kEncoder
Approximate cl100k_base tokenizer using heuristic BPE-like splitting.
ContextWindow
Represents the token capacity of a model's context window.
CostEstimate
Estimated cost for a single API call.
ModelPricing
Per-token pricing for a single model in USD.
ModelPricingTable
Known model pricing constants (as of early 2025).
TokenBudget
Tracks a token budget with reservation support.
TokenCounter
High-level token counting, truncation, splitting, and cost estimation.
TokenEncoder
Abstract interface for text tokenizers.

Functions

estimateTokens(String text) int
Quick heuristic token estimate: roughly 1 token per 4 characters for English text.