smollm2 1.0.7
smollm2: ^1.0.7 copied to clipboard
Pure Dart inference engine for SmolLM2 language models, delivering surprisingly capable local LLM results without requiring CUDA.
1.0.7 #
- Added
LLMTokenGeneratorclass implementing token generation backed byLLMRuntime. - Refactored
SmolLM2to extendLLMTokenGeneratorand delegate runtime logic. runtime.dart:- Added
LLMRuntimeclass handling model loading, inference, KV cache, and sampling. - Supports BF16, Q8, Q16 quantized models with optional jitter during dequantization.
- Implements transformer forward pass, RMS normalization, RoPE, attention, MLP, and sampling.
- Added
token_generator.dart:- Updated
TokenGenerationResultto store prompt and generated tokens as lists of token IDs. - Added getters for token counts and updated
toStringformatting.
- Updated
tokenizer.dart:- Added support for special tokens with
AddedTokenInfoincludingspecialflag. - Added
isSpecialTokandisEOSTokmethods. - Updated tokenizer loading and token matching to handle special tokens.
- Added support for special tokens with
export_smollm2.dart:- Exporter now writes added tokens with
specialflag.
- Exporter now writes added tokens with
bin/smollm2.dart:- Added
SpinnerStyleenum andTokenSpinnerclass for spinner UI during prompt loading. - Updated
_chatSessionto ingest system prompt with spinner and improved chat prompt building. - Improved assistant response handling based on generation stop reason.
- Added
example/smollm2_chat_example.dart:- Added system prompt ingestion before chat loop.
- Updated assistant response handling with stop reason logic.
example/smollm2_completion_example.dartandexample/smollm2_rs_in_strawberry_example.dart:- Updated to use new
TokenGenerationResultoutput and ingest system prompt before generation.
- Updated to use new
lib/src/chat.dart:- Improved
ChatSession.buildPromptto optionally append assistant start token. - Clarified chat prompt format with ChatML-style tokens.
- Improved
lib/src/data.dart:- Added
readU8andwriteU8methods for byte-level I/O.
- Added
test/smollm2_vocab.dart:- Updated added tokens map to include
specialflag for each token.
- Updated added tokens map to include
test/tokenizer_test.dart:- Added tests verifying special token recognition and EOS token detection in tokenized output.
1.0.6 #
-
bin/smollm2.dart:- Added command line options
-j,-js,-jcto enable jitter, set jitter seed, and jitter scale for model loading. - Updated model loading to pass jitter parameters to
SmolLM2.load. - Added jitter-related parameters to startup logs.
- Added command line options
-
bin/export_smollm2.dart:- Added support for
-BF16quantization flag.
- Added support for
-
lib/src/data.dart:- Added
readU16andwriteU16methods toDataReaderandDataWriter. - Added read/write byte count tracking fields.
- Improved hashing extensions with two-hash variant and bit rotations.
- Added
RandomExtensionwith methods for generating jittered floats and noise. - Added
Hash64class for incremental 64-bit hashing. - Added
DurationFormattingextension for human-readable and seconds formatting.
- Added
-
lib/src/kv_cache.dart:- Added
offsetmethod toKVCachefor computing buffer offsets.
- Added
-
lib/src/quant_type.dart:- Added new quant types:
q16PerBlock,bf16. - Updated factory to support new quant types.
- Added new quant types:
-
lib/src/smollm2.dart:- Added detailed documentation for
SmolLM2class andloadmethod. - Added support for optional jitter during dequantization in model loading.
- Added jitter parameters (
jitterSeed,jitterRandom,jitterScale) toload. - Added timing and logging for loading phases (header, config, tokenizer, weights).
- Updated
_loadWeightsto accept jitter parameters and pass them to weights loader. - Added
_loadedflag andisLoadedgetter. - Refactored model loading to support jitter injection during dequantization.
- Improved
rmsNormandapplyRopeimplementations. - Updated
forwardmethod to fix KV cache writes and optimize attention computation. - Updated
tokenizeanddecodeto use newTokenizerEngine. - Updated
samplemethod to support eager greedy sampling and improved repeat penalty logic.
- Added detailed documentation for
-
lib/src/tensor.dart:- Introduced abstract
QTensorbase class for quantized tensors with multiple dequantization methods. - Added jittered and adaptive dequantization methods with optional stochastic jitter.
- Updated
Q8TensorandQ16Tensorto support jittered dequantization viatoFP32Tensormethod. - Added default jitter scale constants for Q8 and Q16 tensors.
- Added new
BF16Tensorclass with FP32 conversion and dot product support.
- Introduced abstract
-
lib/src/token_generator.dart:- Updated
TokenGenerationResult.statsSummaryto use new duration formatting extensions for human-readable timing.
- Updated
-
lib/src/tokenizer.dart:- Added support for added/special tokens in tokenizer.
- Refactored tokenizer to use
TokenizerEnginefor tokenization and decoding. TokenizerEngine:- Supports matching added tokens first (longest match).
- Handles space and newline tokens explicitly.
- Applies BPE merges using merge rank map.
- Decodes tokens with whitespace and newline replacements.
-
lib/src/weights.dart:- Updated
ModelWeights.loadand internal loading methods to accept jitter parameters. - Passed jitter parameters through to tensor reading methods.
- Updated
_readQ8,_readQ16, to support jittered FP32 tensor conversion. - added
_readBF16. - Added support for BF16 quantization type.
- Updated
-
pubspec.yaml:- Updated dev dependencies:
lintsfrom ^6.0.0 to ^6.1.0testfrom ^1.25.6 to ^1.31.1huggingface_downloaderfrom ^1.0.0 to ^1.0.1pathfrom ^1.9.0 to ^1.9.1
- Updated dev dependencies:
-
Added new example files:
example/smollm2_completion_example.dart: Basic text completion example using 360m BF16 model.example/smollm2_chat_example.dart: Interactive multi-turn chat session example updated to use 360m BF16 model.example/smollm2_rs_in_strawberry_example.dart: Prompt formatting and reasoning example using 360m BF16 model.- Added
example/example.mdwith detailed usage instructions and example code snippets. - Removed deprecated example file
example/smollm2_example.dart.
-
Tests:
- Added
test/smollm2_vocab.dartcontaining encoded tokenizer vocabulary and merges. - Added
test/tokenizer_test.dartwith comprehensive tokenizer unit tests covering added tokens, BPE merges, and chat template encoding. - Updated
test/smollm2_test.dartintegration tests:- Added tests for 135M and 360M models including download, export, load, and deterministic generation.
- Added test coverage for jittered model loading and generation.
- Cleaned up temporary directories after tests.
- Added
1.0.5 #
-
Documentation (
README.md):- Added detailed TL;DR section for quick start with local LLM chat.
- Added instructions for installing Dart SDK, Hugging Face model downloader CLI, and SmolLM2 CLI.
- Added recommended commands to download small and larger SmolLM2 models.
- Added instructions to export Hugging Face checkpoints to SMOL Q16 format.
- Added example commands to run interactive chat with exported models.
- Updated CLI usage examples to use global
smollm2andexport_smollm2commands instead ofdart run. - Clarified installation instructions for adding
smollm2dependency and global activation. - Improved formatting and consistency in CLI options and example usage.
-
bin/smollm2.dart:_chatSession: addedseedparameter togeneratecall to support deterministic generation in chat mode.
-
lib/src/smollm2.dart(SmolLM2):- In
generatemethod, movedresetCache()call before initializing_fullTextand_seencaches to ensure proper cache reset.
- In
-
lib/src/token_generator.dart:- Updated default repeat penalty for chat sessions from
1.02to1.0for less penalization of repeated tokens during chat.
- Updated default repeat penalty for chat sessions from
1.0.4 #
- Added chat mode support with interactive prompt-response loop in
bin/smollm2.dart. bin/smollm2.dart:- Added command line options
-cfor chat mode and-nc/--no-coloredto disable colored output. - Added colored output for tokens with distinct colors for prompt, generated tokens, EOS, and max tokens reached.
- Added
_chatSessionfunction for interactive chat with system, user, and assistant roles. - Added
_promptCompletefunction for single prompt completion with optional colored output.
- Added command line options
lib/src/chat.dart:- Added
ChatSessionandChatMessageclasses to manage chat history and build formatted prompts. ChatSessionenhancements:- Added optional
seedparameter and internalrandomgenerator for deterministic sampling. - Added static
generateSeed()method for secure random seed generation. - Added configurable chat template tokens
imStartandimEndwith defaults<|im_start|>and<|im_end|>. - Updated
buildPromptto use configurable tokens and append assistant prompt. - Added
endsWithImEndTokenmethod to check if a response ends with the termination token.
- Added optional
- Added
lib/src/smollm2.dart:- Added optional
loggercallback toSmolLM2for logging model loading and status messages. - Added detailed logging during model loading steps.
- Changed
forwardmethod to track total and context tokens internally. - Added
totalTokensandcontextTokensgetters to track tokens processed and cached. - Added
resetCachemethod to reset KV caches and token counters. - Added incremental prompt ingestion with
ingestmethod supporting partial prompt feeding and token emission. - Refactored
generatemethod to use incremental prompt ingestion and track full generated text. - Added internal
_fullTextbuffer to accumulate all decoded tokens. - Added internal
_seenmap to track token repetition counts across prompt and generation. - Updated
samplemethod to use internal logits and repeat penalty logic.
- Added optional
lib/src/token_generator.dart:- Added
isTerminalproperty toTokenOriginenum to identify terminal token emission events. - Added
randomfield toTokenGenerationResultto expose RNG used during sampling. - Added default chat-specific temperature and repeat penalty constants.
- Added
emmitPromptTokensparameter togeneratemethod to control prompt token emission callbacks.
- Added
lib/smollm2.dart:- Exported new
chat.dartmodule for chat session support.
- Exported new
example/smollm2_example.dart:- Added logger callback to example
SmolLM2instance for verbose output.
- Added logger callback to example
- Example:
- Added
example/smollm2_chat_example.dartdemonstrating interactive chat session usage with token streaming, seed control, and proper prompt management.
- Added
1.0.3 #
- Added streaming token emission support to
SmolLM2.generate:- Added
onTokenEmittedcallback parameter to receive tokens as they are generated. - Emitted tokens during prompt ingestion and generation with associated
TokenOrigin. - Emitted special terminal tokens for EOS and max tokens reached.
- Added
- Introduced
TokenGeneratorinterface and related types intoken_generator.dart:TokenOriginenum to identify token source (prompt, generated, eos, maxTokensReached).OnTokenEmittedcallback typedef for streaming tokens.TokenGenerationStopReasonenum for generation stop reasons.TokenGenerationResultclass encapsulating generation output, parameters, token counts, timings, throughput, and stop reason.TokenGeneratorabstract class defining thegeneratemethod contract.
- Updated
SmolLM2to implementTokenGenerator:generatenow returnsFuture<TokenGenerationResult>instead of raw string.- Added detailed timing and throughput measurements.
- Supports streaming tokens via
onTokenEmitted.
- Updated example CLI (
bin/smollm2.dart) to:- Use
onTokenEmittedcallback to print tokens as they are generated. - Print generation statistics summary after completion.
- Use
- Added comprehensive integration test in
smollm2_test.dart:- Tests full export, load, and deterministic generation workflow.
- Captures and verifies emitted tokens and their origins.
- Validates
TokenGenerationResultfields and stop reason. - Prints emitted tokens and origins for inspection.
1.0.2 #
HFTokenizer:- Updated
mergesfield type toList<(String, String)>. - Improved
loadmethod to parsemergesentries from either list pairs or space-separated strings.
- Updated
TensorRepositoryLoader:- Enhanced shard index detection to check multiple possible index file names (
.safetensors.index.jsonand.index.json).
- Enhanced shard index detection to check multiple possible index file names (
SmolLM2Exporter:- Updated tokenizer merges serialization to write each merge as two separate strings.
SmolLM2:- Updated tokenizer merges deserialization to read pairs of strings instead of single strings.
Tokenizer:- Updated
mergesfield type toList<(String, String)>. - Updated
_buildMergePairsto use tuple elements directly instead of parsing strings.
- Updated
1.0.1 #
pubspec.yaml:- Updated SDK constraint from
^3.10.9to^3.10.0. - Added
executablessection withsmollm2andexport_smollm2.
- Updated SDK constraint from
1.0.0 #
- Initial version.