bitnet_flutter_ai
Beta —
0.1.0-beta.1.
The public API is stable in shape but may change before1.0.0. Native binary distribution and the Web WASM artefact are not yet published to pub.dev — see Building Native Libraries below.
Run Microsoft's BitNet b1.58 2B-4T large language model fully on-device — no server, no API key.
| Platform | Status |
|---|---|
| Android (arm64-v8a) | Planned — native build required |
| iOS (arm64) | Planned — native build required |
| macOS (arm64 / x86_64) | Planned — native build required |
| Linux (x86_64) | Planned — native build required |
| Windows (x86_64) | Planned — native build required |
| Web (Chrome / Firefox) | Planned — WASM build required |
Table of Contents
- Requirements
- Installation
- Quick Start
- API Reference
- Web Setup
- Building Native Libraries
- Architecture
- Beta Limitations
- Contributing
- License
Requirements
| Requirement | Minimum |
|---|---|
| Flutter | 3.24.0 |
| Dart SDK | 3.5.0 |
| Physical RAM | 3072 MB (enforced at runtime) |
| Disk space | ~750 MB (GGUF model file) |
| Android | API 24 (arm64-v8a) |
| iOS | 14.0 (arm64) |
| macOS | 12.0 |
| Web | Chrome 92+ / Firefox 90+ with COI headers |
Installation
Add to your pubspec.yaml:
dependencies:
bitnet_flutter_ai: ^0.1.0-beta.1
Then run:
flutter pub get
Quick Start
import 'package:bitnet_flutter_ai/bitnet_flutter_ai.dart';
Future<void> main() async {
final engine = BitNetEngine();
// 1. Load — downloads the model on first run (~745 MiB), then loads from cache.
await engine.load(
onProgress: (progress) {
final pct = (progress * 100).toInt();
print('Downloading model: $pct%');
},
);
// 2. Generate — returns a Stream<String> of token pieces.
final buffer = StringBuffer();
await for (final token in engine.generate('Explain ternary quantization in one paragraph')) {
buffer.write(token);
print(token); // stream tokens as they arrive
}
print('\n--- Full response ---\n$buffer');
// 3. Dispose — frees native memory and kills the inference isolate.
await engine.dispose();
}
Streaming to a Flutter widget
class ChatPage extends StatefulWidget { ... }
class _ChatPageState extends State<ChatPage> {
final _engine = BitNetEngine();
final _response = ValueNotifier('');
bool _loading = false;
@override
void initState() {
super.initState();
_loadEngine();
}
Future<void> _loadEngine() async {
setState(() => _loading = true);
await _engine.load(onProgress: (_) {});
setState(() => _loading = false);
}
Future<void> _send(String prompt) async {
_response.value = '';
await for (final token in _engine.generate(prompt)) {
_response.value += token;
}
}
@override
void dispose() {
_engine.dispose();
super.dispose();
}
@override
Widget build(BuildContext context) {
if (_loading) return const Center(child: CircularProgressIndicator());
return Column(
children: [
ValueListenableBuilder<String>(
valueListenable: _response,
builder: (_, text, __) => SelectableText(text),
),
ElevatedButton(
onPressed: () => _send('Hello, who are you?'),
child: const Text('Ask'),
),
],
);
}
}
Cancellation
// Start generation in the background.
final sub = engine.generate('Write a long essay').listen((token) {
print(token);
});
// Cancel after 2 seconds.
await Future.delayed(const Duration(seconds: 2));
await engine.cancelGeneration();
await sub.cancel();
API Reference
BitNetEngine
factory BitNetEngine({BitNetModel model = BitNetModel.bitnet2B4T})
Creates an engine for model. Uses a background Isolate on native platforms and
dart:js_interop on Web.
| Method / getter | Description |
|---|---|
Future<void> load({void Function(double)? onProgress}) |
Downloads (if needed), verifies SHA-256, and loads the model. onProgress receives [0.0, 1.0]. |
bool get isLoaded |
true after a successful load(). |
Stream<String> generate(String prompt, {int maxNewTokens = 512, bool addBos = true}) |
Streams token pieces until EOS or maxNewTokens. Throws BitNetNotLoadedException if not loaded. |
Future<void> cancelGeneration() |
Signals the worker to stop at the next token boundary. |
Future<void> dispose() |
Frees all native resources. Do not use the engine after calling this. |
BitNetModel
static const BitNetModel bitnet2B4T
The only supported model in 0.1.0-beta.1. Immutable const with all metadata:
| Field | Value |
|---|---|
id |
microsoft/bitnet-b1.58-2B-4T |
ggufFileName |
ggml-model-i2_s.gguf |
ggufDownloadUrl |
HuggingFace direct link |
minimumRamBytes |
3072 MB |
contextLength |
4096 tokens |
ModelCache
Static utility — direct use is optional (the engine calls it internally).
// Check if model is already cached.
final path = await ModelCache.modelPath(BitNetModel.bitnet2B4T);
// Delete model + any partial download.
await ModelCache.clearModel(BitNetModel.bitnet2B4T);
DeviceInspector
final bool capable = await DeviceInspector.instance.meetsMinimumRam();
final int ramBytes = await DeviceInspector.instance.totalPhysicalRamBytes();
Exceptions
All exceptions extend BitNetException implements Exception.
| Exception | Thrown when |
|---|---|
BitNetUnsupportedDeviceException |
Physical RAM < 3072 MB |
BitNetUnsupportedPlatformException |
Platform not supported (e.g. Safari without WASM threads) |
BitNetLibraryLoadException |
Native .so / .dylib / .dll could not be opened |
BitNetInitException |
bn_init returned NULL (model file corrupt or wrong path) |
BitNetInferenceException |
bn_prompt or bn_next_token returned an error |
BitNetHashMismatchException |
SHA-256 of downloaded file does not match expected |
BitNetDownloadException |
Network error or unexpected HTTP status |
BitNetNotLoadedException |
generate() called before load() |
BitNetIsolateException |
Inference isolate exited unexpectedly |
try {
await engine.load();
} on BitNetUnsupportedDeviceException catch (e) {
print('Not enough RAM: ${e.availableRamBytes ~/ (1024 * 1024)} MB available');
} on BitNetDownloadException catch (e) {
print('Download failed (HTTP ${e.httpStatusCode}): ${e.message}');
} on BitNetException catch (e) {
print('Engine error: $e');
}
Web Setup
WASM inference requires SharedArrayBuffer, which is gated behind Cross-Origin
Isolation (COI) headers. The bundled coi_service_worker.js handles this automatically.
1. Register the service worker
Add this snippet before any other <script> tags in web/index.html:
<script>
if (typeof SharedArrayBuffer === 'undefined') {
const reloaded = sessionStorage.getItem('coi-reload');
if (!reloaded) {
sessionStorage.setItem('coi-reload', '1');
navigator.serviceWorker.register('/coi_service_worker.js')
.then(() => location.reload());
} else {
sessionStorage.removeItem('coi-reload');
}
}
</script>
2. Load the WASM glue script
<!-- Emscripten-generated glue file — exposes window.BitNetWasm -->
<script src="bitnet_bridge.js"></script>
The
bitnet_bridge.js+bitnet_bridge.wasmartefacts are not yet published. See Building Native Libraries for WASM build instructions.
Building Native Libraries
Note: Pre-built binaries will be distributed via GitHub Releases in a future beta. For now, you must compile the C++ bridge yourself.
Prerequisites
- CMake 3.22+
- A C++17 compiler (Clang on Apple, MSVC or Clang on Windows, GCC/Clang on Linux)
- llama.cpp source (or the BitNet fork: bitnet.cpp)
Steps (Desktop)
# 1. Clone llama.cpp alongside this package.
git clone https://github.com/ggerganov/llama.cpp ../llama.cpp
# 2. Build the bridge as a shared library.
mkdir build && cd build
cmake .. \
-DLLAMA_DIR=../../llama.cpp \
-DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release
# 3. Copy the output to the right location.
# macOS: cp libbitnet_bridge.dylib <your_app>/macos/
# Linux: cp libbitnet_bridge.so <your_app>/linux/
# Windows: copy bitnet_bridge.dll <your_app>\windows\
Android
Cross-compile with the Android NDK using CMakeLists.txt in native/.
The output libbitnet_bridge.so must be placed in
android/app/src/main/jniLibs/arm64-v8a/.
iOS
Use Xcode to build a static library (libbitnet_bridge.a) and add it as a
FRAMEWORK_SEARCH_PATHS entry, or use the CocoaPods vendored_libraries approach.
Web (WASM)
# Requires Emscripten (emsdk).
emcmake cmake .. -DCMAKE_BUILD_TYPE=Release
emmake make
# Produces: bitnet_bridge.js + bitnet_bridge.wasm
Copy both files into your Flutter app's web/ folder and reference
bitnet_bridge.js from index.html (see Web Setup).
Architecture
BitNetEngine (interface)
├── NativeEngine — runs on native (Android/iOS/Desktop)
│ ├── Isolate dart:isolate worker (non-blocking UI)
│ └── NativeLibrary dart:ffi → libbitnet_bridge
│ └── BitNetBindings (ffigen-generated)
│ └── bitnet_bridge.cc (llama.cpp C API wrapper)
│
└── WebEngine — runs on Web
└── dart:js_interop → window.BitNetWasm (Emscripten WASM)
ModelCache — HTTP download, SHA-256 verify, disk cache
DeviceInspector — platform RAM detection (fail-closed gate)
Inference flow:
engine.load()— device check → model download/verify → spawn isolate →bn_initengine.generate(prompt)—bn_prompt(prefill) →bn_next_tokenloop → stream- Each
bn_next_tokencall:llama_sampler_sample→llama_token_to_piece→llama_decode
Beta Limitations
- Pre-built native libraries are not distributed. You must build from source.
- SHA-256 hash for the GGUF file is not yet pinned (
PLACEHOLDER_PIN_AFTER_FIRST_DOWNLOAD). Hash verification is skipped with a warning until it is pinned in a future release. - Web WASM artefact is not published. Web support requires a custom Emscripten build.
- Windows and Linux desktop are untested in this beta.
cancelGeneration()on Web cancels only atyieldboundaries — there is no signal to interrupt a synchronous WASM loop mid-token.- No sampling parameter control yet (temperature, top-p). Greedy sampling is used.
- The model is not instruction-tuned by default; wrap your prompt in the appropriate chat template for best results.
Contributing
Issues and PRs are welcome at github.com/IbrahimElmourchidi/bitnet_flutter_ai.
Please open an issue before sending a large PR so we can align on scope.
License
MIT — see LICENSE.
Published by utanium.org.
Libraries
- bitnet_flutter_ai
- bitnet_flutter_ai — run Microsoft BitNet b1.58 2B-4T locally on device.