edge_veda 2.4.2 copy "edge_veda: ^2.4.2" to clipboard
edge_veda: ^2.4.2 copied to clipboard

[pending analysis]

On-device AI runtime for Flutter. Text, vision, speech-to-text, text-to-speech, image generation, embeddings, RAG, and function calling. Metal GPU. Zero cloud.

Changelog #

All notable changes to the Edge Veda Flutter SDK will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

2.4.1 - 2026-02-22 #

Added #

  • Seamless XCFramework distribution: Native C engines (llama.cpp, whisper.cpp, stable-diffusion.cpp) now auto-fetched via CocoaPods trunk — zero manual steps for consumers. Just flutter pub add edge_veda and flutter build ios.
  • EdgeVedaCore pod: Separate CocoaPods pod for binary XCFramework distribution via GitHub Releases
  • publish-pod.sh: Release automation script for CocoaPods trunk publishing
  • 119 unit tests: TextCleaner, JsonRecovery, SchemaValidator, RuntimePolicy, Budget, LatencyTracker, MemoryEstimator

Changed #

  • XCFramework delivery: from manual curl download to automatic CocoaPods dependency resolution
  • edge_veda.podspec now declares s.dependency 'EdgeVedaCore', '~> 2.3' instead of bundling the binary

2.3.1 - 2026-02-22 #

Fixed #

  • TTS hang prevention: TtsService.speak() now includes a 30-second safety timeout — if iOS never fires the finish/cancel delegate (phone call, Bluetooth disconnect, memory pressure), the voice pipeline recovers instead of freezing forever
  • TTS resource cleanup: Subscription and timer properly cleaned up via try/finally on all exit paths, preventing leaks on unexpected exceptions
  • Voice pipeline dispose crash: All event emissions now guarded against closed StreamController, preventing StateError crash when dispose() races with async turn processing
  • Mic re-enable after stop(): _micListening flag now correctly stays false if stop() or pause() is called during the post-TTS cooldown delay
  • sendNow() error state: Push-to-talk error handler no longer incorrectly recovers from error state back to listening
  • Cooldown skip on early returns: Empty transcript, QoS paused, and LLM error paths no longer wait 800ms before resuming the microphone

2.3.0 - 2026-02-20 #

Added #

  • Text-to-Speech: TtsService wrapping iOS AVSpeechSynthesizer via platform channels — zero binary size increase
  • Speak, stop, pause, resume with voice/rate/pitch/volume control
  • Real-time word boundary events via EventChannel for text highlighting
  • Neural/enhanced voice filtering (iOS 16+ high-quality voices)
  • TtsVoice, TtsEvent, TtsEventType, TtsState types exported
  • TTS demo screen with voice picker, rate/pitch sliders, and live word highlighting
  • Completes the voice pipeline: STT → LLM → TTS

2.2.0 - 2026-02-20 #

Added #

  • Image generation Scheduler integration: WorkloadId.image registered with central Scheduler for QoS-gated generation, latency tracking, and thermal/battery awareness
  • Cross-worker memory eviction: Scheduler auto-disposes lowest-priority idle workers when RSS exceeds memory ceiling by >10% — prevents OOM on constrained devices
  • Image idle auto-disposal: SD model (2.3 GB) automatically freed after 60 seconds of inactivity
  • Image progress callbacks fixed: NativeCallable.isolateLocal replaces .listener — per-step progress now fires in real-time during generation
  • Strict schema validation: SchemaValidator.validateStrict() rejects extra keys at any nesting depth
  • JSON recovery: JsonRecovery.tryRepair() auto-fixes truncated/malformed model output (unclosed brackets, trailing garbage, unterminated strings)
  • Validation telemetry: ValidationEvent emitted on every sendStructured() call with pass/fail, recovery details, and timing

Fixed #

  • Image generation: keep SD model weights alive across multiple generations (was crashing on 2nd image with GGML_ASSERT(buft) due to free_params_immediately freeing CLIP/UNet/VAE buffers)

2.1.2 - 2026-02-19 #

Fixed #

  • Image generation: keep SD model weights alive across multiple generations (was crashing on 2nd image with GGML_ASSERT(buft) due to free_params_immediately freeing CLIP/UNet/VAE buffers)

2.1.1 - 2026-02-19 #

Fixed #

  • LICENSE file: added APPENDIX section for pub.dev pana license recognition
  • Example app READMEs: replaced placeholder with actual repo clone URL

2.1.0 - 2026-02-15 #

Added #

  • Smart Model Advisor: Device-aware model recommendations with 4D scoring (fit, quality, speed, context)
  • DeviceProfile: iPhone model, RAM, chip, tier detection via sysctl FFI
  • MemoryEstimator: Calibrated bytes-per-parameter formulas with KV cache + overhead
  • ModelAdvisor.recommend(): Ranked model list with optimal EdgeVedaConfig per model+device
  • ModelAdvisor.canRun(): Quick fit check before download
  • Storage availability: getFreeDiskSpace() via MethodChannel
  • Qwen3 0.6B in ModelRegistry (tool-calling capable, Q4_K_M, 397 MB)
  • All MiniLM L6 v2 in ModelRegistry (embedding model, F16, 46 MB)

Changed #

  • KV cache quantization: Q8_0 by default (halves cache from ~64MB to ~32MB)
  • Flash attention AUTO enabled by default
  • getMemoryStats() routed through StreamingWorker (eliminates ~600MB spike)

Fixed #

  • Batched prompt evaluation: chunk in n_batch-sized batches (fixes 3rd+ multi-turn assertion)
  • Streaming persistence: assistant message saved after natural stream close
  • GBNF grammar-constrained generation reliability

2.0.1 - 2026-02-13 #

Fixed #

  • Podspec license corrected to Apache-2.0 (was MIT), repo URLs updated
  • Pub.dev license detection now resolves correctly

2.0.0 - 2026-02-13 #

Added #

  • RAG Demo Apps: Document Q&A with two-model pipeline (embedder + generator), detective investigation screen
  • Tool Calling Demo: Native iOS data providers (contacts, calendar, location), detective reasoning screen
  • Performance Benchmarks: BENCHMARKS.md with normalized device metrics across all modalities

Changed #

  • Memory Optimization: KV cache Q8_0 halves cache memory (~64 MB → ~32 MB), flash attention wired through config
  • getMemoryStats() Fix: Routed through StreamingWorker — eliminates ~600 MB spike from double model load
  • Pub.dev Metadata: Added SPDX license identifier, topics for discoverability, updated description

Fixed #

  • Batched Prompt Evaluation: Conversations exceeding 512 tokens no longer crash — prompt eval now chunks in n_batch-sized batches
  • Streaming Response Persistence: Assistant messages reliably saved after streaming completes (async* generator cancellation fix)

Performance #

  • Text generation: 42–43 tok/s sustained (Llama 3.2 1B, Metal GPU)
  • Vector search: <1 ms (HNSW, cosine similarity)
  • Steady-state memory: 400–550 MB (down from ~1,200 MB peaks)
  • Vision soak test: 12.6 min, 254 frames, 0 crashes

1.3.1 - 2026-02-12 #

Fixed #

  • README updated with complete v1.3.0 feature documentation (STT, function calling, embeddings, RAG, confidence scoring)
  • Supported Models table now includes Whisper and embedding models

1.3.0 - 2026-02-12 #

Added #

  • Whisper STT: On-device speech-to-text via whisper.cpp with streaming transcription API
  • WhisperWorker: Persistent isolate for speech recognition (model loads once)
  • WhisperSession: High-level streaming API with 3-second chunk processing at 16kHz
  • iOS Audio Capture: AVAudioEngine + AVAudioConverter for native 48kHz to 16kHz mono conversion
  • Structured Output: Grammar-constrained generation via GBNF sampler for valid JSON output
  • Function Calling: sendWithTools() for multi-round tool chains with ToolDefinition, ToolCall, ToolResult
  • Tool Registry: Register tools with JSON schema validation, model selects and invokes relevant tools
  • Embeddings API: embed() returns L2-normalized float vectors via ev_embed() C API — works with any GGUF embedding model
  • Confidence Scoring: Per-token confidence (0.0-1.0) from softmax entropy of logits, zero overhead when disabled
  • Cloud Handoff: needsCloudHandoff flag when average confidence drops below confidenceThreshold
  • VectorIndex: Pure Dart HNSW vector search (via local_hnsw) with cosine similarity and JSON persistence
  • RagPipeline: End-to-end retrieval-augmented generation — embed query, search index, inject context, generate
  • STT Demo Screen: Live microphone transcription with pulsing recording indicator
  • Chat Tools Demo: Toggle function calling (get_time, calculate) in chat screen

Changed #

  • XCFramework rebuilt with whisper, grammar, embedding, and confidence symbols
  • Podspec symbol whitelist expanded for all new ev_* functions
  • EvGenerationParams struct layout fixed (grammar_str/grammar_root fields added)
  • TokenChunk and GenerateResponse now include confidence fields
  • Chat templates extended with Qwen3/Hermes-style tool message support
  • Android builds use 16KB page alignment for Android 15+ compliance

1.2.0 - 2026-02-09 #

Added #

  • Compute Budget Contracts: Declare p95 latency, battery drain, thermal level, and memory ceiling constraints via EdgeVedaBudget
  • Adaptive Budget Profiles: BudgetProfile.conservative / .balanced / .performance auto-calibrate to measured device performance after warm-up
  • MeasuredBaseline: Inspect actual device metrics (p95, drain rate, thermal, RSS) via Scheduler.measuredBaseline
  • Central Scheduler: Arbitrates concurrent workloads (vision + text) with priority-based degradation every 2 seconds
  • Budget Violation Events: Scheduler.onBudgetViolation stream with constraint details, mitigation status, and observeOnly classification
  • Two-Phase Resolution: Latency constraints resolve at ~40s, battery constraints resolve when drain data arrives (~2min)
  • Experiment Tracking: analyze_trace.py supports 6 testable hypotheses with versioned experiment runs
  • Trace Export: Share JSONL trace files from soak test via native iOS share sheet
  • Adaptive Budget UI: Soak test screen shows measured baseline, resolved budget, and resolution status live

Changed #

  • Soak test uses EdgeVedaBudget.adaptive(BudgetProfile.balanced) instead of hardcoded values
  • RuntimePolicy is now display-only; Scheduler is sole authority for inference gating
  • PerfTrace captures scheduler_decision, budget_check, budget_violation, and budget_resolved entries

1.1.1 - 2026-02-09 #

Fixed #

  • License corrected to Apache 2.0 (was incorrectly MIT in pub.dev package)
  • README rewritten with accurate capabilities and real soak test metrics
  • CHANGELOG cleaned up to reflect only shipped features

1.1.0 - 2026-02-08 #

Added #

  • Vision (VLM): SmolVLM2-500M support for real-time camera-to-text inference
  • Chat Session API: Multi-turn conversation management with context overflow summarization
  • Chat Templates: Llama 3 Instruct, ChatML, and generic template formats
  • System Prompt Presets: Built-in assistant, coder, and creative personas
  • VisionWorker: Persistent isolate for vision inference (model loads once, reused across frames)
  • FrameQueue: Drop-newest backpressure for camera frame processing
  • RuntimePolicy: Adaptive QoS with thermal/battery/memory-aware hysteresis
  • TelemetryService: iOS thermal state, battery level, memory polling via MethodChannel
  • PerfTrace: JSONL performance trace logger for soak test analysis
  • Soak Test Screen: 15-minute automated vision benchmark in demo app
  • initVision() and describeImage() APIs
  • CameraUtils for BGRA/YUV420 to RGB conversion
  • Context indicator (turn count + usage bar) in demo Chat tab
  • New Chat button and persona picker in demo app

Changed #

  • Upgraded llama.cpp from b4658 to b7952
  • XCFramework rebuilt with all symbols including ev_vision_get_last_timings
  • Demo app redesigned with dark theme, 3-tab navigation (Chat, Vision, Settings)
  • Chat tab rewritten to use ChatSession API (no direct generate() calls)
  • All FFI bindings now eager (removed lazy workaround for missing symbols)
  • Constrained ffi to <2.1.0 (avoids objective_c simulator crash)

Fixed #

  • Xcode 26 debug blank executor: export _main in podspec symbol whitelist
  • RuntimePolicy evaluate() de-escalation when pressure improves but persists

1.0.0 - 2026-02-04 #

Added #

  • Core SDK: On-device LLM inference via llama.cpp with Metal GPU on iOS
  • Dart FFI: 37 native function bindings via DynamicLibrary.process()
  • Streaming: Token-by-token generation with CancelToken cancellation
  • Model Management: Download, cache, SHA-256 verify, delete
  • Memory Monitoring: RSS tracking, pressure callbacks, configurable limits
  • Isolate Safety: All FFI calls in Isolate.run(), persistent StreamingWorker
  • XCFramework: Device arm64 + simulator arm64 static library packaging
  • Demo App: Chat screen with streaming, model selection, benchmark mode
  • Exception Hierarchy: 10 typed exceptions mapped from native error codes
13
likes
0
points
852
downloads

Publisher

verified publisherkurioai.co

Weekly Downloads

On-device AI runtime for Flutter. Text, vision, speech-to-text, text-to-speech, image generation, embeddings, RAG, and function calling. Metal GPU. Zero cloud.

Repository (GitHub)
View/report issues

Topics

#ai #machine-learning #on-device #llm #inference

License

(pending) (license)

Dependencies

crypto, ffi, flutter, http, image, json_schema, local_hnsw, path, path_provider

More

Packages that depend on edge_veda

Packages that implement edge_veda