edge_veda library
Edge Veda SDK - On-device LLM inference for Flutter
Example usage:
import 'package:edge_veda/edge_veda.dart';
final edgeVeda = EdgeVeda();
await edgeVeda.init(EdgeVedaConfig(modelPath: '/path/to/model.gguf'));
final response = await edgeVeda.generate('Hello, world!');
print(response.text);
await edgeVeda.dispose();
Features
- On-device LLM inference with llama.cpp and Metal acceleration
- Streaming token-by-token generation with cancellation support
- Model download with progress tracking and caching
- Memory-safe operations with configurable limits
- Zero server costs and 100% offline operation
Streaming Generation
final cancelToken = CancelToken();
final stream = edgeVeda.generateStream(
'Tell me a story',
cancelToken: cancelToken,
);
await for (final chunk in stream) {
stdout.write(chunk.token);
if (chunk.isFinal) break;
}
// To cancel mid-stream:
cancelToken.cancel();
Model Management
final modelManager = ModelManager();
// Download a pre-configured model
final modelPath = await modelManager.downloadModel(
ModelRegistry.llama32_1b,
);
// Monitor download progress
modelManager.downloadProgress.listen((progress) {
print('Progress: ${progress.progressPercent}%');
});
// Check downloaded models
final models = await modelManager.getDownloadedModels();
print('Downloaded: $models');
Memory Monitoring
// Check memory usage
final stats = await edgeVeda.getMemoryStats();
print('Memory: ${(stats.usagePercent * 100).toStringAsFixed(1)}%');
// Quick pressure check
if (await edgeVeda.isMemoryPressure()) {
print('High memory usage!');
}
Classes
- BatteryDrainTracker
- Rolling battery drain rate estimator.
- BudgetViolation
- Emitted when the Scheduler cannot satisfy a declared budget constraint even after attempting mitigation.
- CameraUtils
- Utility class for converting camera image formats to RGB888
- CancelToken
- Token for cancelling ongoing operations (downloads, streaming generation)
- ChatMessage
- A single message in a conversation
- ChatSession
- Manages multi-turn conversation state on top of EdgeVeda
- ConfidenceInfo
- Confidence information for a generated token or response
- DownloadProgress
- Model download progress information
- EdgeVeda
- Main Edge Veda SDK class for on-device AI inference
- EdgeVedaBudget
- Declarative resource budget for on-device inference.
- EdgeVedaConfig
- Configuration for initializing Edge Veda SDK
- EmbeddingResult
- Result of a text embedding operation
- FrameQueue
- Bounded frame queue with drop-newest backpressure policy.
- GbnfBuilder
- Builds GBNF grammar strings from JSON Schema definitions.
- GenerateOptions
- Options for text generation
- GenerateResponse
- Response from text generation
- LatencyTracker
- Rolling window percentile tracker for inference latencies.
- MeasuredBaseline
- Snapshot of actual device performance measured during warm-up.
- MemoryPressureEvent
- Memory pressure event from native layer
- MemoryStats
- Memory usage statistics from native layer
- ModelInfo
- Model information
- ModelManager
- Manages model downloads, caching, and verification
- ModelRegistry
- Pre-configured model registry with popular models
- PerfTrace
- JSONL performance trace logger for vision inference benchmarking.
- QoSKnobs
- QoS knob values for a given QoSLevel.
- RagConfig
- Configuration for the RAG pipeline
- RagPipeline
- End-to-end RAG pipeline: embed query -> search index -> inject context -> generate
- RuntimePolicy
- Runtime policy that adapts vision inference QoS based on thermal, battery, and memory pressure signals.
- Scheduler
- Central coordinator that enforces EdgeVedaBudget constraints across concurrent on-device inference workloads.
- SchemaValidationResult
- Result of validating JSON data against a JSON Schema.
- SchemaValidator
- Validates JSON data against JSON Schema Draft 7 schemas.
- SearchResult
- Result from a vector similarity search.
- TelemetryService
- Service for querying iOS thermal, battery, and memory telemetry.
- TelemetrySnapshot
- A point-in-time snapshot of all telemetry values.
- TokenChunk
- Token chunk in a streaming response
- ToolCall
- A parsed tool call extracted from model output.
- ToolDefinition
- Immutable definition of a tool that a model can invoke.
- ToolRegistry
- Manages a collection of ToolDefinitions with budget-aware filtering.
- ToolResult
- Result of executing a tool call, provided by the developer.
- ToolTemplate
- Formats tool definitions for model prompts and parses tool calls from model output.
- VectorIndex
- HNSW-backed vector index for on-device similarity search.
- VisionConfig
- Configuration for initializing vision inference
- VisionInitSuccessResponse
- Vision context initialized successfully
- VisionResultResponse
- Vision inference completed with description and timing data
- VisionWorker
- Worker isolate manager for persistent vision inference
- WhisperSegment
- A single transcription segment with timing information.
- WhisperSession
- High-level streaming transcription session with Scheduler integration.
- WhisperTranscribeResponse
- Transcription completed with segments and timing
- WhisperWorker
- Worker isolate manager for persistent whisper inference
Enums
- BudgetConstraint
- Which budget constraint was violated.
- BudgetProfile
- Adaptive budget profile expressing intent as multipliers on measured device baseline.
- ChatRole
- Role of a message in a conversation
- ChatTemplateFormat
- Supported chat template formats
- QoSLevel
- Quality-of-Service levels for vision inference runtime.
- SystemPromptPreset
- Built-in system prompt presets for common use cases
- ToolPriority
- Priority level for a tool in the registry.
- WorkloadId
- Unique identifier for each workload type managed by the scheduler.
- WorkloadPriority
- Priority level for a registered workload.
Exceptions / Errors
- ChecksumException
- Thrown when checksum verification fails
- ConfigurationException
- Thrown when invalid configuration is provided
- DownloadException
- Thrown when model download fails
- EdgeVedaException
- Base exception class for Edge Veda errors
- EdgeVedaGenericException
- Generic exception for unknown native errors
- EmbeddingException
- Exception thrown when embedding operation fails
- GenerationException
- Thrown when text generation fails
- InitializationException
- Thrown when SDK initialization fails
- MemoryException
- Thrown when memory limit is exceeded
- ModelLoadException
- Thrown when model loading fails
- ModelValidationException
- Thrown when model file fails validation (checksum mismatch, corrupted file)
- ToolCallParseException
- Thrown when model output cannot be parsed as a valid tool call.
- VisionException
- Thrown when vision inference fails