smollm2 1.0.7 copy "smollm2: ^1.0.7" to clipboard
smollm2: ^1.0.7 copied to clipboard

Pure Dart inference engine for SmolLM2 language models, delivering surprisingly capable local LLM results without requiring CUDA.

1.0.7 #

  • Added LLMTokenGenerator class implementing token generation backed by LLMRuntime.
  • Refactored SmolLM2 to extend LLMTokenGenerator and delegate runtime logic.
  • runtime.dart:
    • Added LLMRuntime class handling model loading, inference, KV cache, and sampling.
    • Supports BF16, Q8, Q16 quantized models with optional jitter during dequantization.
    • Implements transformer forward pass, RMS normalization, RoPE, attention, MLP, and sampling.
  • token_generator.dart:
    • Updated TokenGenerationResult to store prompt and generated tokens as lists of token IDs.
    • Added getters for token counts and updated toString formatting.
  • tokenizer.dart:
    • Added support for special tokens with AddedTokenInfo including special flag.
    • Added isSpecialTok and isEOSTok methods.
    • Updated tokenizer loading and token matching to handle special tokens.
  • export_smollm2.dart:
    • Exporter now writes added tokens with special flag.
  • bin/smollm2.dart:
    • Added SpinnerStyle enum and TokenSpinner class for spinner UI during prompt loading.
    • Updated _chatSession to ingest system prompt with spinner and improved chat prompt building.
    • Improved assistant response handling based on generation stop reason.
  • example/smollm2_chat_example.dart:
    • Added system prompt ingestion before chat loop.
    • Updated assistant response handling with stop reason logic.
  • example/smollm2_completion_example.dart and example/smollm2_rs_in_strawberry_example.dart:
    • Updated to use new TokenGenerationResult output and ingest system prompt before generation.
  • lib/src/chat.dart:
    • Improved ChatSession.buildPrompt to optionally append assistant start token.
    • Clarified chat prompt format with ChatML-style tokens.
  • lib/src/data.dart:
    • Added readU8 and writeU8 methods for byte-level I/O.
  • test/smollm2_vocab.dart:
    • Updated added tokens map to include special flag for each token.
  • test/tokenizer_test.dart:
    • Added tests verifying special token recognition and EOS token detection in tokenized output.

1.0.6 #

  • bin/smollm2.dart:

    • Added command line options -j, -js, -jc to enable jitter, set jitter seed, and jitter scale for model loading.
    • Updated model loading to pass jitter parameters to SmolLM2.load.
    • Added jitter-related parameters to startup logs.
  • bin/export_smollm2.dart:

    • Added support for -BF16 quantization flag.
  • lib/src/data.dart:

    • Added readU16 and writeU16 methods to DataReader and DataWriter.
    • Added read/write byte count tracking fields.
    • Improved hashing extensions with two-hash variant and bit rotations.
    • Added RandomExtension with methods for generating jittered floats and noise.
    • Added Hash64 class for incremental 64-bit hashing.
    • Added DurationFormatting extension for human-readable and seconds formatting.
  • lib/src/kv_cache.dart:

    • Added offset method to KVCache for computing buffer offsets.
  • lib/src/quant_type.dart:

    • Added new quant types: q16PerBlock, bf16.
    • Updated factory to support new quant types.
  • lib/src/smollm2.dart:

    • Added detailed documentation for SmolLM2 class and load method.
    • Added support for optional jitter during dequantization in model loading.
    • Added jitter parameters (jitterSeed, jitterRandom, jitterScale) to load.
    • Added timing and logging for loading phases (header, config, tokenizer, weights).
    • Updated _loadWeights to accept jitter parameters and pass them to weights loader.
    • Added _loaded flag and isLoaded getter.
    • Refactored model loading to support jitter injection during dequantization.
    • Improved rmsNorm and applyRope implementations.
    • Updated forward method to fix KV cache writes and optimize attention computation.
    • Updated tokenize and decode to use new TokenizerEngine.
    • Updated sample method to support eager greedy sampling and improved repeat penalty logic.
  • lib/src/tensor.dart:

    • Introduced abstract QTensor base class for quantized tensors with multiple dequantization methods.
    • Added jittered and adaptive dequantization methods with optional stochastic jitter.
    • Updated Q8Tensor and Q16Tensor to support jittered dequantization via toFP32Tensor method.
    • Added default jitter scale constants for Q8 and Q16 tensors.
    • Added new BF16Tensor class with FP32 conversion and dot product support.
  • lib/src/token_generator.dart:

    • Updated TokenGenerationResult.statsSummary to use new duration formatting extensions for human-readable timing.
  • lib/src/tokenizer.dart:

    • Added support for added/special tokens in tokenizer.
    • Refactored tokenizer to use TokenizerEngine for tokenization and decoding.
    • TokenizerEngine:
      • Supports matching added tokens first (longest match).
      • Handles space and newline tokens explicitly.
      • Applies BPE merges using merge rank map.
      • Decodes tokens with whitespace and newline replacements.
  • lib/src/weights.dart:

    • Updated ModelWeights.load and internal loading methods to accept jitter parameters.
    • Passed jitter parameters through to tensor reading methods.
    • Updated _readQ8, _readQ16, to support jittered FP32 tensor conversion.
    • added _readBF16.
    • Added support for BF16 quantization type.
  • pubspec.yaml:

    • Updated dev dependencies:
      • lints from ^6.0.0 to ^6.1.0
      • test from ^1.25.6 to ^1.31.1
      • huggingface_downloader from ^1.0.0 to ^1.0.1
      • path from ^1.9.0 to ^1.9.1
  • Added new example files:

    • example/smollm2_completion_example.dart: Basic text completion example using 360m BF16 model.
    • example/smollm2_chat_example.dart: Interactive multi-turn chat session example updated to use 360m BF16 model.
    • example/smollm2_rs_in_strawberry_example.dart: Prompt formatting and reasoning example using 360m BF16 model.
    • Added example/example.md with detailed usage instructions and example code snippets.
    • Removed deprecated example file example/smollm2_example.dart.
  • Tests:

    • Added test/smollm2_vocab.dart containing encoded tokenizer vocabulary and merges.
    • Added test/tokenizer_test.dart with comprehensive tokenizer unit tests covering added tokens, BPE merges, and chat template encoding.
    • Updated test/smollm2_test.dart integration tests:
      • Added tests for 135M and 360M models including download, export, load, and deterministic generation.
      • Added test coverage for jittered model loading and generation.
      • Cleaned up temporary directories after tests.

1.0.5 #

  • Documentation (README.md):

    • Added detailed TL;DR section for quick start with local LLM chat.
    • Added instructions for installing Dart SDK, Hugging Face model downloader CLI, and SmolLM2 CLI.
    • Added recommended commands to download small and larger SmolLM2 models.
    • Added instructions to export Hugging Face checkpoints to SMOL Q16 format.
    • Added example commands to run interactive chat with exported models.
    • Updated CLI usage examples to use global smollm2 and export_smollm2 commands instead of dart run.
    • Clarified installation instructions for adding smollm2 dependency and global activation.
    • Improved formatting and consistency in CLI options and example usage.
  • bin/smollm2.dart:

    • _chatSession: added seed parameter to generate call to support deterministic generation in chat mode.
  • lib/src/smollm2.dart (SmolLM2):

    • In generate method, moved resetCache() call before initializing _fullText and _seen caches to ensure proper cache reset.
  • lib/src/token_generator.dart:

    • Updated default repeat penalty for chat sessions from 1.02 to 1.0 for less penalization of repeated tokens during chat.

1.0.4 #

  • Added chat mode support with interactive prompt-response loop in bin/smollm2.dart.
  • bin/smollm2.dart:
    • Added command line options -c for chat mode and -nc/--no-colored to disable colored output.
    • Added colored output for tokens with distinct colors for prompt, generated tokens, EOS, and max tokens reached.
    • Added _chatSession function for interactive chat with system, user, and assistant roles.
    • Added _promptComplete function for single prompt completion with optional colored output.
  • lib/src/chat.dart:
    • Added ChatSession and ChatMessage classes to manage chat history and build formatted prompts.
    • ChatSession enhancements:
      • Added optional seed parameter and internal random generator for deterministic sampling.
      • Added static generateSeed() method for secure random seed generation.
      • Added configurable chat template tokens imStart and imEnd with defaults <|im_start|> and <|im_end|>.
      • Updated buildPrompt to use configurable tokens and append assistant prompt.
      • Added endsWithImEndToken method to check if a response ends with the termination token.
  • lib/src/smollm2.dart:
    • Added optional logger callback to SmolLM2 for logging model loading and status messages.
    • Added detailed logging during model loading steps.
    • Changed forward method to track total and context tokens internally.
    • Added totalTokens and contextTokens getters to track tokens processed and cached.
    • Added resetCache method to reset KV caches and token counters.
    • Added incremental prompt ingestion with ingest method supporting partial prompt feeding and token emission.
    • Refactored generate method to use incremental prompt ingestion and track full generated text.
    • Added internal _fullText buffer to accumulate all decoded tokens.
    • Added internal _seen map to track token repetition counts across prompt and generation.
    • Updated sample method to use internal logits and repeat penalty logic.
  • lib/src/token_generator.dart:
    • Added isTerminal property to TokenOrigin enum to identify terminal token emission events.
    • Added random field to TokenGenerationResult to expose RNG used during sampling.
    • Added default chat-specific temperature and repeat penalty constants.
    • Added emmitPromptTokens parameter to generate method to control prompt token emission callbacks.
  • lib/smollm2.dart:
    • Exported new chat.dart module for chat session support.
  • example/smollm2_example.dart:
    • Added logger callback to example SmolLM2 instance for verbose output.
  • Example:
    • Added example/smollm2_chat_example.dart demonstrating interactive chat session usage with token streaming, seed control, and proper prompt management.

1.0.3 #

  • Added streaming token emission support to SmolLM2.generate:
    • Added onTokenEmitted callback parameter to receive tokens as they are generated.
    • Emitted tokens during prompt ingestion and generation with associated TokenOrigin.
    • Emitted special terminal tokens for EOS and max tokens reached.
  • Introduced TokenGenerator interface and related types in token_generator.dart:
    • TokenOrigin enum to identify token source (prompt, generated, eos, maxTokensReached).
    • OnTokenEmitted callback typedef for streaming tokens.
    • TokenGenerationStopReason enum for generation stop reasons.
    • TokenGenerationResult class encapsulating generation output, parameters, token counts, timings, throughput, and stop reason.
    • TokenGenerator abstract class defining the generate method contract.
  • Updated SmolLM2 to implement TokenGenerator:
    • generate now returns Future<TokenGenerationResult> instead of raw string.
    • Added detailed timing and throughput measurements.
    • Supports streaming tokens via onTokenEmitted.
  • Updated example CLI (bin/smollm2.dart) to:
    • Use onTokenEmitted callback to print tokens as they are generated.
    • Print generation statistics summary after completion.
  • Added comprehensive integration test in smollm2_test.dart:
    • Tests full export, load, and deterministic generation workflow.
    • Captures and verifies emitted tokens and their origins.
    • Validates TokenGenerationResult fields and stop reason.
    • Prints emitted tokens and origins for inspection.

1.0.2 #

  • HFTokenizer:
    • Updated merges field type to List<(String, String)>.
    • Improved load method to parse merges entries from either list pairs or space-separated strings.
  • TensorRepositoryLoader:
    • Enhanced shard index detection to check multiple possible index file names (.safetensors.index.json and .index.json).
  • SmolLM2Exporter:
    • Updated tokenizer merges serialization to write each merge as two separate strings.
  • SmolLM2:
    • Updated tokenizer merges deserialization to read pairs of strings instead of single strings.
  • Tokenizer:
    • Updated merges field type to List<(String, String)>.
    • Updated _buildMergePairs to use tuple elements directly instead of parsing strings.

1.0.1 #

  • pubspec.yaml:
    • Updated SDK constraint from ^3.10.9 to ^3.10.0.
    • Added executables section with smollm2 and export_smollm2.

1.0.0 #

  • Initial version.
1
likes
160
points
338
downloads

Documentation

API reference

Publisher

unverified uploader

Weekly Downloads

Pure Dart inference engine for SmolLM2 language models, delivering surprisingly capable local LLM results without requiring CUDA.

Repository (GitHub)
View/report issues

License

Apache-2.0 (license)

Dependencies

collection

More

Packages that depend on smollm2