edge_veda library

Edge Veda SDK - On-device LLM inference for Flutter

Example usage:

import 'package:edge_veda/edge_veda.dart';

final edgeVeda = EdgeVeda();
await edgeVeda.init(EdgeVedaConfig(modelPath: '/path/to/model.gguf'));
final response = await edgeVeda.generate('Hello, world!');
print(response.text);
await edgeVeda.dispose();

Features

  • On-device LLM inference with llama.cpp and Metal acceleration
  • Streaming token-by-token generation with cancellation support
  • Model download with progress tracking and caching
  • Memory-safe operations with configurable limits
  • Zero server costs and 100% offline operation

Streaming Generation

final cancelToken = CancelToken();
final stream = edgeVeda.generateStream(
  'Tell me a story',
  cancelToken: cancelToken,
);

await for (final chunk in stream) {
  stdout.write(chunk.token);
  if (chunk.isFinal) break;
}

// To cancel mid-stream:
cancelToken.cancel();

Model Management

final modelManager = ModelManager();

// Download a pre-configured model
final modelPath = await modelManager.downloadModel(
  ModelRegistry.llama32_1b,
);

// Monitor download progress
modelManager.downloadProgress.listen((progress) {
  print('Progress: ${progress.progressPercent}%');
});

// Check downloaded models
final models = await modelManager.getDownloadedModels();
print('Downloaded: $models');

Memory Monitoring

// Check memory usage
final stats = await edgeVeda.getMemoryStats();
print('Memory: ${(stats.usagePercent * 100).toStringAsFixed(1)}%');

// Quick pressure check
if (await edgeVeda.isMemoryPressure()) {
  print('High memory usage!');
}

Classes

BatteryDrainTracker
Rolling battery drain rate estimator.
BudgetViolation
Emitted when the Scheduler cannot satisfy a declared budget constraint even after attempting mitigation.
CameraUtils
Utility class for converting camera image formats to RGB888
CancelToken
Token for cancelling ongoing operations (downloads, streaming generation)
ChatMessage
A single message in a conversation
ChatSession
Manages multi-turn conversation state on top of EdgeVeda
ConfidenceInfo
Confidence information for a generated token or response
DownloadProgress
Model download progress information
EdgeVeda
Main Edge Veda SDK class for on-device AI inference
EdgeVedaBudget
Declarative resource budget for on-device inference.
EdgeVedaConfig
Configuration for initializing Edge Veda SDK
EmbeddingResult
Result of a text embedding operation
FrameQueue
Bounded frame queue with drop-newest backpressure policy.
GbnfBuilder
Builds GBNF grammar strings from JSON Schema definitions.
GenerateOptions
Options for text generation
GenerateResponse
Response from text generation
LatencyTracker
Rolling window percentile tracker for inference latencies.
MeasuredBaseline
Snapshot of actual device performance measured during warm-up.
MemoryPressureEvent
Memory pressure event from native layer
MemoryStats
Memory usage statistics from native layer
ModelInfo
Model information
ModelManager
Manages model downloads, caching, and verification
ModelRegistry
Pre-configured model registry with popular models
PerfTrace
JSONL performance trace logger for vision inference benchmarking.
QoSKnobs
QoS knob values for a given QoSLevel.
RagConfig
Configuration for the RAG pipeline
RagPipeline
End-to-end RAG pipeline: embed query -> search index -> inject context -> generate
RuntimePolicy
Runtime policy that adapts vision inference QoS based on thermal, battery, and memory pressure signals.
Scheduler
Central coordinator that enforces EdgeVedaBudget constraints across concurrent on-device inference workloads.
SchemaValidationResult
Result of validating JSON data against a JSON Schema.
SchemaValidator
Validates JSON data against JSON Schema Draft 7 schemas.
SearchResult
Result from a vector similarity search.
TelemetryService
Service for querying iOS thermal, battery, and memory telemetry.
TelemetrySnapshot
A point-in-time snapshot of all telemetry values.
TokenChunk
Token chunk in a streaming response
ToolCall
A parsed tool call extracted from model output.
ToolDefinition
Immutable definition of a tool that a model can invoke.
ToolRegistry
Manages a collection of ToolDefinitions with budget-aware filtering.
ToolResult
Result of executing a tool call, provided by the developer.
ToolTemplate
Formats tool definitions for model prompts and parses tool calls from model output.
VectorIndex
HNSW-backed vector index for on-device similarity search.
VisionConfig
Configuration for initializing vision inference
VisionInitSuccessResponse
Vision context initialized successfully
VisionResultResponse
Vision inference completed with description and timing data
VisionWorker
Worker isolate manager for persistent vision inference
WhisperSegment
A single transcription segment with timing information.
WhisperSession
High-level streaming transcription session with Scheduler integration.
WhisperTranscribeResponse
Transcription completed with segments and timing
WhisperWorker
Worker isolate manager for persistent whisper inference

Enums

BudgetConstraint
Which budget constraint was violated.
BudgetProfile
Adaptive budget profile expressing intent as multipliers on measured device baseline.
ChatRole
Role of a message in a conversation
ChatTemplateFormat
Supported chat template formats
QoSLevel
Quality-of-Service levels for vision inference runtime.
SystemPromptPreset
Built-in system prompt presets for common use cases
ToolPriority
Priority level for a tool in the registry.
WorkloadId
Unique identifier for each workload type managed by the scheduler.
WorkloadPriority
Priority level for a registered workload.

Exceptions / Errors

ChecksumException
Thrown when checksum verification fails
ConfigurationException
Thrown when invalid configuration is provided
DownloadException
Thrown when model download fails
EdgeVedaException
Base exception class for Edge Veda errors
EdgeVedaGenericException
Generic exception for unknown native errors
EmbeddingException
Exception thrown when embedding operation fails
GenerationException
Thrown when text generation fails
InitializationException
Thrown when SDK initialization fails
MemoryException
Thrown when memory limit is exceeded
ModelLoadException
Thrown when model loading fails
ModelValidationException
Thrown when model file fails validation (checksum mismatch, corrupted file)
ToolCallParseException
Thrown when model output cannot be parsed as a valid tool call.
VisionException
Thrown when vision inference fails