bitnet_flutter_ai

Beta — 0.1.0-beta.1.
The public API is stable in shape but may change before 1.0.0. Native binary distribution and the Web WASM artefact are not yet published to pub.dev — see Building Native Libraries below.

Run Microsoft's BitNet b1.58 2B-4T large language model fully on-device — no server, no API key.

Platform	Status
Android (arm64-v8a)	Planned — native build required
iOS (arm64)	Planned — native build required
macOS (arm64 / x86_64)	Planned — native build required
Linux (x86_64)	Planned — native build required
Windows (x86_64)	Planned — native build required
Web (Chrome / Firefox)	Planned — WASM build required

Requirements
Installation
Quick Start
API Reference
Web Setup
Building Native Libraries
Architecture
Beta Limitations
Contributing
License

Requirements

Requirement	Minimum
Flutter	3.24.0
Dart SDK	3.5.0
Physical RAM	3072 MB (enforced at runtime)
Disk space	~750 MB (GGUF model file)
Android	API 24 (arm64-v8a)
iOS	14.0 (arm64)
macOS	12.0
Web	Chrome 92+ / Firefox 90+ with COI headers

Installation

Add to your pubspec.yaml:

dependencies:
  bitnet_flutter_ai: ^0.1.0-beta.1

Then run:

flutter pub get

Quick Start

import 'package:bitnet_flutter_ai/bitnet_flutter_ai.dart';

Future<void> main() async {
  final engine = BitNetEngine();

  // 1. Load — downloads the model on first run (~745 MiB), then loads from cache.
  await engine.load(
    onProgress: (progress) {
      final pct = (progress * 100).toInt();
      print('Downloading model: $pct%');
    },
  );

  // 2. Generate — returns a Stream<String> of token pieces.
  final buffer = StringBuffer();
  await for (final token in engine.generate('Explain ternary quantization in one paragraph')) {
    buffer.write(token);
    print(token); // stream tokens as they arrive
  }

  print('\n--- Full response ---\n$buffer');

  // 3. Dispose — frees native memory and kills the inference isolate.
  await engine.dispose();
}

class ChatPage extends StatefulWidget { ... }

class _ChatPageState extends State<ChatPage> {
  final _engine = BitNetEngine();
  final _response = ValueNotifier('');
  bool _loading = false;

  @override
  void initState() {
    super.initState();
    _loadEngine();
  }

  Future<void> _loadEngine() async {
    setState(() => _loading = true);
    await _engine.load(onProgress: (_) {});
    setState(() => _loading = false);
  }

  Future<void> _send(String prompt) async {
    _response.value = '';
    await for (final token in _engine.generate(prompt)) {
      _response.value += token;
    }
  }

  @override
  void dispose() {
    _engine.dispose();
    super.dispose();
  }

  @override
  Widget build(BuildContext context) {
    if (_loading) return const Center(child: CircularProgressIndicator());
    return Column(
      children: [
        ValueListenableBuilder<String>(
          valueListenable: _response,
          builder: (_, text, __) => SelectableText(text),
        ),
        ElevatedButton(
          onPressed: () => _send('Hello, who are you?'),
          child: const Text('Ask'),
        ),
      ],
    );
  }
}

Cancellation

// Start generation in the background.
final sub = engine.generate('Write a long essay').listen((token) {
  print(token);
});

// Cancel after 2 seconds.
await Future.delayed(const Duration(seconds: 2));
await engine.cancelGeneration();
await sub.cancel();

API Reference

`BitNetEngine`

factory BitNetEngine({BitNetModel model = BitNetModel.bitnet2B4T})

Creates an engine for model. Uses a background Isolate on native platforms and dart:js_interop on Web.

Method / getter	Description
`Future<void> load({void Function(double)? onProgress})`	Downloads (if needed), verifies SHA-256, and loads the model. `onProgress` receives `[0.0, 1.0]`.
`bool get isLoaded`	`true` after a successful `load()`.
`Stream<String> generate(String prompt, {int maxNewTokens = 512, bool addBos = true})`	Streams token pieces until EOS or `maxNewTokens`. Throws `BitNetNotLoadedException` if not loaded.
`Future<void> cancelGeneration()`	Signals the worker to stop at the next token boundary.
`Future<void> dispose()`	Frees all native resources. Do not use the engine after calling this.

`BitNetModel`

static const BitNetModel bitnet2B4T

The only supported model in 0.1.0-beta.1. Immutable const with all metadata:

Field	Value
`id`	`microsoft/bitnet-b1.58-2B-4T`
`ggufFileName`	`ggml-model-i2_s.gguf`
`ggufDownloadUrl`	HuggingFace direct link
`minimumRamBytes`	3072 MB
`contextLength`	4096 tokens

`ModelCache`

Static utility — direct use is optional (the engine calls it internally).

// Check if model is already cached.
final path = await ModelCache.modelPath(BitNetModel.bitnet2B4T);

// Delete model + any partial download.
await ModelCache.clearModel(BitNetModel.bitnet2B4T);

`DeviceInspector`

final bool capable = await DeviceInspector.instance.meetsMinimumRam();
final int ramBytes = await DeviceInspector.instance.totalPhysicalRamBytes();

Exceptions

All exceptions extend BitNetException implements Exception.

Exception	Thrown when
`BitNetUnsupportedDeviceException`	Physical RAM < 3072 MB
`BitNetUnsupportedPlatformException`	Platform not supported (e.g. Safari without WASM threads)
`BitNetLibraryLoadException`	Native `.so` / `.dylib` / `.dll` could not be opened
`BitNetInitException`	`bn_init` returned `NULL` (model file corrupt or wrong path)
`BitNetInferenceException`	`bn_prompt` or `bn_next_token` returned an error
`BitNetHashMismatchException`	SHA-256 of downloaded file does not match expected
`BitNetDownloadException`	Network error or unexpected HTTP status
`BitNetNotLoadedException`	`generate()` called before `load()`
`BitNetIsolateException`	Inference isolate exited unexpectedly

try {
  await engine.load();
} on BitNetUnsupportedDeviceException catch (e) {
  print('Not enough RAM: ${e.availableRamBytes ~/ (1024 * 1024)} MB available');
} on BitNetDownloadException catch (e) {
  print('Download failed (HTTP ${e.httpStatusCode}): ${e.message}');
} on BitNetException catch (e) {
  print('Engine error: $e');
}

Web Setup

WASM inference requires SharedArrayBuffer, which is gated behind Cross-Origin Isolation (COI) headers. The bundled coi_service_worker.js handles this automatically.

1. Register the service worker

Add this snippet before any other <script> tags in web/index.html:

<script>
  if (typeof SharedArrayBuffer === 'undefined') {
    const reloaded = sessionStorage.getItem('coi-reload');
    if (!reloaded) {
      sessionStorage.setItem('coi-reload', '1');
      navigator.serviceWorker.register('/coi_service_worker.js')
        .then(() => location.reload());
    } else {
      sessionStorage.removeItem('coi-reload');
    }
  }
</script>

2. Load the WASM glue script

<!-- Emscripten-generated glue file — exposes window.BitNetWasm -->
<script src="bitnet_bridge.js"></script>

The bitnet_bridge.js + bitnet_bridge.wasm artefacts are not yet published. See Building Native Libraries for WASM build instructions.

Building Native Libraries

Note: Pre-built binaries will be distributed via GitHub Releases in a future beta. For now, you must compile the C++ bridge yourself.

Prerequisites

CMake 3.22+
A C++17 compiler (Clang on Apple, MSVC or Clang on Windows, GCC/Clang on Linux)
llama.cpp source (or the BitNet fork: bitnet.cpp)

Steps (Desktop)

# 1. Clone llama.cpp alongside this package.
git clone https://github.com/ggerganov/llama.cpp ../llama.cpp

# 2. Build the bridge as a shared library.
mkdir build && cd build
cmake .. \
  -DLLAMA_DIR=../../llama.cpp \
  -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release

# 3. Copy the output to the right location.
# macOS:  cp libbitnet_bridge.dylib <your_app>/macos/
# Linux:  cp libbitnet_bridge.so    <your_app>/linux/
# Windows: copy bitnet_bridge.dll   <your_app>\windows\

Android

Cross-compile with the Android NDK using CMakeLists.txt in native/. The output libbitnet_bridge.so must be placed in android/app/src/main/jniLibs/arm64-v8a/.

iOS

Use Xcode to build a static library (libbitnet_bridge.a) and add it as a FRAMEWORK_SEARCH_PATHS entry, or use the CocoaPods vendored_libraries approach.

Web (WASM)

# Requires Emscripten (emsdk).
emcmake cmake .. -DCMAKE_BUILD_TYPE=Release
emmake make
# Produces: bitnet_bridge.js + bitnet_bridge.wasm

Copy both files into your Flutter app's web/ folder and reference bitnet_bridge.js from index.html (see Web Setup).

Architecture

BitNetEngine (interface)
 ├── NativeEngine          — runs on native (Android/iOS/Desktop)
 │    ├── Isolate           dart:isolate worker (non-blocking UI)
 │    └── NativeLibrary     dart:ffi → libbitnet_bridge
 │         └── BitNetBindings  (ffigen-generated)
 │              └── bitnet_bridge.cc  (llama.cpp C API wrapper)
 │
 └── WebEngine             — runs on Web
      └── dart:js_interop  → window.BitNetWasm (Emscripten WASM)

ModelCache                 — HTTP download, SHA-256 verify, disk cache
DeviceInspector            — platform RAM detection (fail-closed gate)

Inference flow:

engine.load() — device check → model download/verify → spawn isolate → bn_init
engine.generate(prompt) — bn_prompt (prefill) → bn_next_token loop → stream
Each bn_next_token call: llama_sampler_sample → llama_token_to_piece → llama_decode

Beta Limitations

Pre-built native libraries are not distributed. You must build from source.
SHA-256 hash for the GGUF file is not yet pinned (PLACEHOLDER_PIN_AFTER_FIRST_DOWNLOAD). Hash verification is skipped with a warning until it is pinned in a future release.
Web WASM artefact is not published. Web support requires a custom Emscripten build.
Windows and Linux desktop are untested in this beta.
cancelGeneration() on Web cancels only at yield boundaries — there is no signal to interrupt a synchronous WASM loop mid-token.
No sampling parameter control yet (temperature, top-p). Greedy sampling is used.
The model is not instruction-tuned by default; wrap your prompt in the appropriate chat template for best results.

Contributing

Issues and PRs are welcome at github.com/IbrahimElmourchidi/bitnet_flutter_ai.

Please open an issue before sending a large PR so we can align on scope.

License

MIT — see LICENSE.

Published by utanium.org.