xybrid_flutter 0.1.2
xybrid_flutter: ^0.1.2 copied to clipboard
Xybrid Flutter SDK — run ML models on-device or in the cloud with intelligent hybrid routing and streaming inference.
Changelog #
0.1.2 #
- Audio inputs now detect MP3, OGG, and FLAC in addition to WAV, and mono audio is upmixed to stereo when a model expects two channels (xybrid-ai/xybrid#132, #141)
- Robustness: the underlying SDK/core no longer panics on poisoned locks, unchecked length headers, or non-contiguous ONNX output tensors — these are recovered or handled gracefully (xybrid-ai/xybrid#233, #234, #235, #231, #232, #237)
- The Xybrid API key is no longer placed in the process environment (xybrid-ai/xybrid#214)
- Registry requests now honor
Retry-Afteron429responses (xybrid-ai/xybrid#134)
0.1.1 #
- New bundled
init()entry point starts anonymous-by-default telemetry from an API key; the standaloneinitTelemetryis now legacy (xybrid-ai/xybrid#188, #195) PlatformEventpayloads now carrysdk_versionandbinding, so telemetry is attributable to the SDK build and the Flutter binding that emitted it (xybrid-ai/xybrid#183)- Fixed: the SDK no longer leaks the leading bytes of its own API key into emitted telemetry (xybrid-ai/xybrid#209)
- Fixed: cache TTL handling is panic-safe — a backwards system clock no longer panics the cache layer (xybrid-ai/xybrid#203)
- Example app now reads
XYBRID_API_KEYfrom the environment at init (xybrid-ai/xybrid#207)
0.1.0 #
Production release of the 0.1.0 line. No Flutter-binding code changes since rc4 — closes the rc series.
Cumulative since the last published-to-pub.dev release (rc3):
XybridResultnow exposes typedInferenceMetrics(CPU / memory / GPU / wall-clock per inference); the underlying telemetry is also surfaced in the bundled Flutter demos- Streaming-LLM cloud fallback now routes off live device pressure signals (CPU / memory / thermal) instead of static thresholds
ModelWarmupevents emit fromXybridModel.warmupand arrive in the binding's telemetry stream, so first-token latency is attributable to warmup vs. inferencestreamingis now a top-level field onPlatformEventpayloads instead of nested under metadata- GGUF bundles without an explicit backend annotation now report
llamacppin telemetry instead ofunknown - New
Denormalizepostprocessing step in the SDK core (mirror ofNormalize), useful for round-tripping model output back into input-space coordinates - Fixed:
ModelCompleteevents were dropped on streaming fast-path inference; now emitted on every code path - Fixed: internal orchestrator pipeline-frame events no longer leak to the binding as opaque payloads
0.1.0-rc4 #
XybridResultnow exposes typedInferenceMetrics(CPU / memory / GPU / wall-clock per inference); the underlying telemetry is also surfaced in the bundled Flutter demos- Streaming-LLM cloud fallback now routes off live device pressure signals (CPU / memory / thermal) instead of static thresholds
ModelWarmupevents emit fromXybridModel.warmupand arrive in the binding's telemetry stream, so first-token latency is attributable to warmup vs. inferencestreamingis now a top-level field onPlatformEventpayloads instead of nested under metadata- GGUF bundles without an explicit backend annotation now report
llamacppin telemetry instead ofunknown - New
Denormalizepostprocessing step in the SDK core (mirror ofNormalize), useful for round-tripping model output back into input-space coordinates - Fixed:
ModelCompleteevents were dropped on streaming fast-path inference; now emitted on every code path - Fixed: internal orchestrator pipeline-frame events no longer leak to the binding as opaque payloads
0.1.0-rc3 #
- Adaptive cloud fallback for streaming LLM: pipelines can now transparently fall back to a cloud runtime when on-device streaming generation stalls or errors mid-stream; configurable via new run options on the underlying SDK
- Streaming and chat-context LLM telemetry spans now include backend and quantization tags (previously dropped on these code paths)
- Hybrid LLM architectures (Mamba / SSM-style) now load and run cleanly through the bundled llama.cpp runtime
0.1.0-rc2 #
- Republishes 0.1.0-rc1 — the rc1 pub.dev publish was skipped due to an upstream compile failure in
xybrid-coreonaarch64-linux-android(fixed in xybrid-ai/xybrid#112). No API or behavior changes in the Flutter binding itself.
0.1.0-rc1 #
- Registry calls now send the
X-Xybrid-Clienttelemetry header identifying the Flutter binding, SDK / core versions, platform, and enabled backends; respects theXYBRID_TELEMETRY_OPTOUTenv var - Per-inference resource telemetry: CPU / memory / GPU pressure metrics now flow into telemetry events from the underlying SDK
- Cloud LLM telemetry exposes provider-agnostic prompt-cache token counts (
cache_creation/cache_read)
0.1.0-beta12 #
- LLM telemetry expansion: swim-lane spans, device profile metadata, and Pipeline::run hardening on top of beta11's streaming telemetry
- Fixed Windows precompile path mangling that was blocking native binaries from publishing to pub.dev
0.1.0-beta11 #
- Added LLM streaming telemetry: TTFT, decode/prefill TPS, and ITL now exposed via the SDK for both
llama_cppandmistralbackends - Added
Devicestruct with a stable cross-platform device identifier - Added NeuTTS codec TTS support
- Improved offline behavior: actionable errors and cached-models fallback when the registry is unreachable
0.1.0-beta10 #
- Version bump to track core release. No Flutter API changes.
0.1.0-beta9 #
- Added
fromDirectory()for loading custom local models - Added
fromHuggingFace()for loading models directly from HuggingFace Hub - Fixed cargokit version hash not triggering rebuilds across releases
0.1.0-beta8 #
- Fixed LLM model loading failing with "Unknown frame descriptor" on all platforms — passthrough GGUF models now load correctly (#16)
0.1.0-beta7 #
- Fixed
libc++_shared.somissing from Android APK — replaced symlinks with NDK copy task - Fixed Android 16KB page alignment for newer devices
0.1.0-beta6 #
- Version bump to track core release. No Flutter API changes.
0.1.0-beta5 #
- Qwen 3.5 model support via updated llama.cpp backend
- Automatic
<think>tag stripping for reasoning models
0.1.0-beta4 #
- Version bump to track core release. No Flutter API changes.
0.1.0-beta3 #
- Version bump to track core release. No Flutter API changes.
0.1.0-beta2 #
- Version bump — core runtime fix (reverted ORT to
2.0.0-rc.11). No Flutter API changes.
0.1.0-beta1 #
- Version bump to track core release. No Flutter API changes.
0.1.0-alpha8 #
- Version bump to track core release. No Flutter API changes.
0.1.0-alpha7 #
Features #
- GenerationConfig: Control LLM generation parameters (temperature, top_p, max_tokens, etc.) via optional
configparameter on allXybridModelrun and streaming methods - GenerationConfig presets:
GenerationConfig.greedy()andGenerationConfig.creative()named constructors for common configurations
0.1.0-alpha5 #
Features #
- Registry model loading: Load models directly from the xybrid registry with
Xybrid.model(modelId: '...') - LLM chat streaming: Real-time token-by-token streaming for LLM inference
- Conversation context: Multi-turn conversation memory with
ConversationContext - Pipeline execution: Run multi-stage ML pipelines from YAML definitions
- 5-platform support: macOS, iOS, Android, Linux, Windows
Improvements #
- Remote model usage example added to Flutter example app
- Updated LLM demo screen in Flutter example app
- Kotlin SDK published to Maven Central (
ai.xybrid:xybrid-kotlin:0.1.0-alpha3)
0.1.0-alpha4 #
Features #
- TTS quality improvements: Silence token handling, center-break chunking, voice mixing, CJK punctuation, inter-chunk crossfading, configurable speed
- Composable model system: Metadata-driven TTS input mapping, voice selection strategy
- KittenTTS phonemizer fix: Switched from CmuDict to MisakiDictionary for correct phoneme output
Improvements #
- Model naming convention standardized (e.g.,
kitten-tts-nano-0.2) - TTS registry cleaned up with proper model versioning
0.1.0-alpha3 #
Features #
- LLM hardening: Thread-safe llama.cpp wrapper, multi-token EOG, min_p sampling
- Windows support: MSVC CRT mismatch resolved, Git Bash CFLAGS fix
- Unity iOS build: C FFI library building for iOS targets
Improvements #
- Release CI fixes across all platforms
- Test CI and release workflow updates
- Metadata generation tooling for automated model config
0.1.0-alpha2 #
Features #
- Conversation memory:
ConversationContextwith configurable FIFO pruning,ChatTemplateFormatter(ChatML, Llama 2) - Unified ORT iOS: Shared
vendor/ort-ios/xcframework across all build paths - xtask auto-detection: Build commands automatically select platform features based on target triple
Breaking Changes #
- Feature flag cascade fix:
ort-download+ort-dynamicnow caught at compile time - Platform presets renamed for clarity
0.1.0-alpha1 #
Features #
- Platform SDK restructure: UniFFI bindings (Swift/Kotlin), xybrid-ffi (C API)
- Thin Flutter FFI: ~150 LOC Dart bridge via flutter_rust_bridge
- xtask build commands:
cargo xtask build-ffi,build-uniffi,build-xcframework,build-android,build-flutter - GitHub Actions CI: Automated builds for all platforms
Breaking Changes #
xybrid_core::llmmodule renamed toxybrid_core::cloudPipelineLoaderrenamed toPipelineRefXybridPipelinerenamed toPipeline- Direct TTS API removed (use pipeline execution instead)
Platform Support #
| Platform | ONNX Runtime | Candle | LLM |
|---|---|---|---|
| macOS | download-binaries | Metal | llama.cpp |
| iOS | vendor/ort-ios/ | Metal | llama.cpp |
| Android | load-dynamic | - | llama.cpp |
| Linux | download-binaries | CPU | llama.cpp |
| Windows | download-binaries | CPU | llama.cpp |