edge_veda 2.0.0 copy "edge_veda: ^2.0.0" to clipboard
edge_veda: ^2.0.0 copied to clipboard

On-device AI runtime for Flutter — text generation, vision, speech-to-text, embeddings, RAG, structured output, and function calling. Runs Llama, Phi, Gemma, SmolVLM, and Whisper models locally with M [...]

Changelog #

All notable changes to the Edge Veda Flutter SDK will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

2.0.0 - 2026-02-13 #

Added #

  • RAG Demo Apps: Document Q&A with two-model pipeline (embedder + generator), detective investigation screen
  • Tool Calling Demo: Native iOS data providers (contacts, calendar, location), detective reasoning screen
  • Performance Benchmarks: BENCHMARKS.md with normalized device metrics across all modalities

Changed #

  • Memory Optimization: KV cache Q8_0 halves cache memory (~64 MB → ~32 MB), flash attention wired through config
  • getMemoryStats() Fix: Routed through StreamingWorker — eliminates ~600 MB spike from double model load
  • Pub.dev Metadata: Added SPDX license identifier, topics for discoverability, updated description

Fixed #

  • Batched Prompt Evaluation: Conversations exceeding 512 tokens no longer crash — prompt eval now chunks in n_batch-sized batches
  • Streaming Response Persistence: Assistant messages reliably saved after streaming completes (async* generator cancellation fix)

Performance #

  • Text generation: 42–43 tok/s sustained (Llama 3.2 1B, Metal GPU)
  • Vector search: <1 ms (HNSW, cosine similarity)
  • Steady-state memory: 400–550 MB (down from ~1,200 MB peaks)
  • Vision soak test: 12.6 min, 254 frames, 0 crashes

1.3.1 - 2026-02-12 #

Fixed #

  • README updated with complete v1.3.0 feature documentation (STT, function calling, embeddings, RAG, confidence scoring)
  • Supported Models table now includes Whisper and embedding models

1.3.0 - 2026-02-12 #

Added #

  • Whisper STT: On-device speech-to-text via whisper.cpp with streaming transcription API
  • WhisperWorker: Persistent isolate for speech recognition (model loads once)
  • WhisperSession: High-level streaming API with 3-second chunk processing at 16kHz
  • iOS Audio Capture: AVAudioEngine + AVAudioConverter for native 48kHz to 16kHz mono conversion
  • Structured Output: Grammar-constrained generation via GBNF sampler for valid JSON output
  • Function Calling: sendWithTools() for multi-round tool chains with ToolDefinition, ToolCall, ToolResult
  • Tool Registry: Register tools with JSON schema validation, model selects and invokes relevant tools
  • Embeddings API: embed() returns L2-normalized float vectors via ev_embed() C API — works with any GGUF embedding model
  • Confidence Scoring: Per-token confidence (0.0-1.0) from softmax entropy of logits, zero overhead when disabled
  • Cloud Handoff: needsCloudHandoff flag when average confidence drops below confidenceThreshold
  • VectorIndex: Pure Dart HNSW vector search (via local_hnsw) with cosine similarity and JSON persistence
  • RagPipeline: End-to-end retrieval-augmented generation — embed query, search index, inject context, generate
  • STT Demo Screen: Live microphone transcription with pulsing recording indicator
  • Chat Tools Demo: Toggle function calling (get_time, calculate) in chat screen

Changed #

  • XCFramework rebuilt with whisper, grammar, embedding, and confidence symbols
  • Podspec symbol whitelist expanded for all new ev_* functions
  • EvGenerationParams struct layout fixed (grammar_str/grammar_root fields added)
  • TokenChunk and GenerateResponse now include confidence fields
  • Chat templates extended with Qwen3/Hermes-style tool message support
  • Android builds use 16KB page alignment for Android 15+ compliance

1.2.0 - 2026-02-09 #

Added #

  • Compute Budget Contracts: Declare p95 latency, battery drain, thermal level, and memory ceiling constraints via EdgeVedaBudget
  • Adaptive Budget Profiles: BudgetProfile.conservative / .balanced / .performance auto-calibrate to measured device performance after warm-up
  • MeasuredBaseline: Inspect actual device metrics (p95, drain rate, thermal, RSS) via Scheduler.measuredBaseline
  • Central Scheduler: Arbitrates concurrent workloads (vision + text) with priority-based degradation every 2 seconds
  • Budget Violation Events: Scheduler.onBudgetViolation stream with constraint details, mitigation status, and observeOnly classification
  • Two-Phase Resolution: Latency constraints resolve at ~40s, battery constraints resolve when drain data arrives (~2min)
  • Experiment Tracking: analyze_trace.py supports 6 testable hypotheses with versioned experiment runs
  • Trace Export: Share JSONL trace files from soak test via native iOS share sheet
  • Adaptive Budget UI: Soak test screen shows measured baseline, resolved budget, and resolution status live

Changed #

  • Soak test uses EdgeVedaBudget.adaptive(BudgetProfile.balanced) instead of hardcoded values
  • RuntimePolicy is now display-only; Scheduler is sole authority for inference gating
  • PerfTrace captures scheduler_decision, budget_check, budget_violation, and budget_resolved entries

1.1.1 - 2026-02-09 #

Fixed #

  • License corrected to Apache 2.0 (was incorrectly MIT in pub.dev package)
  • README rewritten with accurate capabilities and real soak test metrics
  • CHANGELOG cleaned up to reflect only shipped features

1.1.0 - 2026-02-08 #

Added #

  • Vision (VLM): SmolVLM2-500M support for real-time camera-to-text inference
  • Chat Session API: Multi-turn conversation management with context overflow summarization
  • Chat Templates: Llama 3 Instruct, ChatML, and generic template formats
  • System Prompt Presets: Built-in assistant, coder, and creative personas
  • VisionWorker: Persistent isolate for vision inference (model loads once, reused across frames)
  • FrameQueue: Drop-newest backpressure for camera frame processing
  • RuntimePolicy: Adaptive QoS with thermal/battery/memory-aware hysteresis
  • TelemetryService: iOS thermal state, battery level, memory polling via MethodChannel
  • PerfTrace: JSONL performance trace logger for soak test analysis
  • Soak Test Screen: 15-minute automated vision benchmark in demo app
  • initVision() and describeImage() APIs
  • CameraUtils for BGRA/YUV420 to RGB conversion
  • Context indicator (turn count + usage bar) in demo Chat tab
  • New Chat button and persona picker in demo app

Changed #

  • Upgraded llama.cpp from b4658 to b7952
  • XCFramework rebuilt with all symbols including ev_vision_get_last_timings
  • Demo app redesigned with dark theme, 3-tab navigation (Chat, Vision, Settings)
  • Chat tab rewritten to use ChatSession API (no direct generate() calls)
  • All FFI bindings now eager (removed lazy workaround for missing symbols)
  • Constrained ffi to <2.1.0 (avoids objective_c simulator crash)

Fixed #

  • Xcode 26 debug blank executor: export _main in podspec symbol whitelist
  • RuntimePolicy evaluate() de-escalation when pressure improves but persists

1.0.0 - 2026-02-04 #

Added #

  • Core SDK: On-device LLM inference via llama.cpp with Metal GPU on iOS
  • Dart FFI: 37 native function bindings via DynamicLibrary.process()
  • Streaming: Token-by-token generation with CancelToken cancellation
  • Model Management: Download, cache, SHA-256 verify, delete
  • Memory Monitoring: RSS tracking, pressure callbacks, configurable limits
  • Isolate Safety: All FFI calls in Isolate.run(), persistent StreamingWorker
  • XCFramework: Device arm64 + simulator arm64 static library packaging
  • Demo App: Chat screen with streaming, model selection, benchmark mode
  • Exception Hierarchy: 10 typed exceptions mapped from native error codes
17
likes
0
points
156
downloads

Publisher

verified publisherkurioai.co

Weekly Downloads

On-device AI runtime for Flutter — text generation, vision, speech-to-text, embeddings, RAG, structured output, and function calling. Runs Llama, Phi, Gemma, SmolVLM, and Whisper models locally with Metal GPU acceleration. Compute budget contracts enforce p95 latency, battery, thermal, and memory guarantees. Private by default — zero cloud dependencies during inference.

Repository (GitHub)
View/report issues

Topics

#ai #machine-learning #on-device #llm #inference

License

unknown (license)

Dependencies

crypto, ffi, flutter, http, json_schema, local_hnsw, path, path_provider

More

Packages that depend on edge_veda

Packages that implement edge_veda