edge_veda library

Edge Veda SDK - On-device LLM inference for Flutter

Example usage:

import 'package:edge_veda/edge_veda.dart';

final edgeVeda = EdgeVeda();
await edgeVeda.init(EdgeVedaConfig(modelPath: '/path/to/model.gguf'));
final response = await edgeVeda.generate('Hello, world!');
print(response.text);
await edgeVeda.dispose();

Features

On-device LLM inference with llama.cpp and Metal acceleration
Streaming token-by-token generation with cancellation support
Model download with progress tracking and caching
Memory-safe operations with configurable limits
Zero server costs and 100% offline operation

Streaming Generation

final cancelToken = CancelToken();
final stream = edgeVeda.generateStream(
  'Tell me a story',
  cancelToken: cancelToken,
);

await for (final chunk in stream) {
  stdout.write(chunk.token);
  if (chunk.isFinal) break;
}

// To cancel mid-stream:
cancelToken.cancel();

Model Management

final modelManager = ModelManager();

// Download a pre-configured model
final modelPath = await modelManager.downloadModel(
  ModelRegistry.llama32_1b,
);

// Monitor download progress
modelManager.downloadProgress.listen((progress) {
  print('Progress: ${progress.progressPercent}%');
});

// Check downloaded models
final models = await modelManager.getDownloadedModels();
print('Downloaded: $models');

Memory Monitoring

// Check memory usage
final stats = await edgeVeda.getMemoryStats();
print('Memory: ${(stats.usagePercent * 100).toStringAsFixed(1)}%');

// Quick pressure check
if (await edgeVeda.isMemoryPressure()) {
  print('High memory usage!');
}

Classes

BatteryDrainTracker: Rolling battery drain rate estimator.
BudgetViolation: Emitted when the Scheduler cannot satisfy a declared budget constraint even after attempting mitigation.
CameraUtils: Utility class for converting camera image formats to RGB888
CancelToken: Token for cancelling ongoing operations (downloads, streaming generation)
ChatMessage: A single message in a conversation
ChatSession: Manages multi-turn conversation state on top of EdgeVeda
ConfidenceInfo: Confidence information for a generated token or response
DownloadProgress: Model download progress information
EdgeVeda: Main Edge Veda SDK class for on-device AI inference
EdgeVedaBudget: Declarative resource budget for on-device inference.
EdgeVedaConfig: Configuration for initializing Edge Veda SDK
EmbeddingResult: Result of a text embedding operation
FrameQueue: Bounded frame queue with drop-newest backpressure policy.
GbnfBuilder: Builds GBNF grammar strings from JSON Schema definitions.
GenerateOptions: Options for text generation
GenerateResponse: Response from text generation
LatencyTracker: Rolling window percentile tracker for inference latencies.
MeasuredBaseline: Snapshot of actual device performance measured during warm-up.
MemoryPressureEvent: Memory pressure event from native layer
MemoryStats: Memory usage statistics from native layer
ModelInfo: Model information
ModelManager: Manages model downloads, caching, and verification
ModelRegistry: Pre-configured model registry with popular models
PerfTrace: JSONL performance trace logger for vision inference benchmarking.
QoSKnobs: QoS knob values for a given QoSLevel.
RagConfig: Configuration for the RAG pipeline
RagPipeline: End-to-end RAG pipeline: embed query -> search index -> inject context -> generate
RuntimePolicy: Runtime policy that adapts vision inference QoS based on thermal, battery, and memory pressure signals.
Scheduler: Central coordinator that enforces EdgeVedaBudget constraints across concurrent on-device inference workloads.
SchemaValidationResult: Result of validating JSON data against a JSON Schema.
SchemaValidator: Validates JSON data against JSON Schema Draft 7 schemas.
SearchResult: Result from a vector similarity search.
TelemetryService: Service for querying iOS thermal, battery, and memory telemetry.
TelemetrySnapshot: A point-in-time snapshot of all telemetry values.
TokenChunk: Token chunk in a streaming response
ToolCall: A parsed tool call extracted from model output.
ToolDefinition: Immutable definition of a tool that a model can invoke.
ToolRegistry: Manages a collection of ToolDefinitions with budget-aware filtering.
ToolResult: Result of executing a tool call, provided by the developer.
ToolTemplate: Formats tool definitions for model prompts and parses tool calls from model output.
VectorIndex: HNSW-backed vector index for on-device similarity search.
VisionConfig: Configuration for initializing vision inference
VisionInitSuccessResponse: Vision context initialized successfully
VisionResultResponse: Vision inference completed with description and timing data
VisionWorker: Worker isolate manager for persistent vision inference
WhisperSegment: A single transcription segment with timing information.
WhisperSession: High-level streaming transcription session with Scheduler integration.
WhisperTranscribeResponse: Transcription completed with segments and timing
WhisperWorker: Worker isolate manager for persistent whisper inference

Enums

BudgetConstraint: Which budget constraint was violated.
BudgetProfile: Adaptive budget profile expressing intent as multipliers on measured device baseline.
ChatRole: Role of a message in a conversation
ChatTemplateFormat: Supported chat template formats
QoSLevel: Quality-of-Service levels for vision inference runtime.
SystemPromptPreset: Built-in system prompt presets for common use cases
ToolPriority: Priority level for a tool in the registry.
WorkloadId: Unique identifier for each workload type managed by the scheduler.
WorkloadPriority: Priority level for a registered workload.

Exceptions / Errors

ChecksumException: Thrown when checksum verification fails
ConfigurationException: Thrown when invalid configuration is provided
DownloadException: Thrown when model download fails
EdgeVedaException: Base exception class for Edge Veda errors
EdgeVedaGenericException: Generic exception for unknown native errors
EmbeddingException: Exception thrown when embedding operation fails
GenerationException: Thrown when text generation fails
InitializationException: Thrown when SDK initialization fails
MemoryException: Thrown when memory limit is exceeded
ModelLoadException: Thrown when model loading fails
ModelValidationException: Thrown when model file fails validation (checksum mismatch, corrupted file)
ToolCallParseException: Thrown when model output cannot be parsed as a valid tool call.
VisionException: Thrown when vision inference fails

edge_veda library

Features

Streaming Generation

Model Management

Memory Monitoring

Classes

Enums

Exceptions / Errors

edge_veda package

edge_veda library