text_sight - Dart API docs

Live, on-device text recognition for Flutter — Apple Vision on iOS, ML Kit on Android. Like mobile_scanner, but for text instead of barcodes.

Live text recognition — confidence-coloured boxes over the camera feed

Why text_sight?
A quick taste
Platform support
Install
The recognition model
Performance
Going deeper

Why text_sight?

Most cross-platform OCR plugins run Google ML Kit on both platforms. That quietly pulls GoogleMLKit into your iOS build — and with it the arm64 and Swift Package Manager warnings that have been nagging Flutter iOS builds for a while.

text_sight takes the other road. On iOS it uses Apple Vision, a system framework, so your app links zero third-party ML libraries there — no GoogleMLKit, no warnings. Android keeps ML Kit, declared only in its own Gradle file. Nothing recognition-related ever reaches your pubspec.yaml, so the two platforms can't bleed into each other. Clean, native text scanning on both. That's the whole idea.

A quick taste

Point the camera at some text:

final controller = TextSightController();

TextSightView(
controller: controller,
onResult: (capture) => capture.lines.forEach((line) => print(line.text)),
overlayBuilder: (context, capture, constraints) => /* paint line.boundingBox */,
);

await controller.requestCameraPermission(); // prompts via the OS — no permission package needed
await controller.start();

Or read a single still — no camera, no permission:

final capture = await TextSight.recognizeImage(bytes); // or .recognizePath('/photo.jpg')

Either way, boxes come back normalized [0, 1] from the top-left, identical on both platforms, so your overlay never has to know which engine drew them.

Want a scan-box? Hand the controller a region of interest — TextSightController(options: TextSightOptions(roi: Rect.fromLTWH(0.1, 0.4, 0.8, 0.2))) — or change it, the recognition level, or the torch while the session runs. It applies to the live preview and the one-shot alike.

One Android thing worth knowing up front: the model downloads on first use, so give it a head start when the user opens your scanner — otherwise that first scan comes back empty.

The example/ app is where to look next — a live overlay, torch, region-of-interest, permission handling, and the one-shot screen, all wired up and ready to crib from.

_{Android · ML Kit}

_{iOS · Apple Vision}

Platform support

Platform	Minimum	Engine
iOS	13.0	Apple Vision — `RecognizeTextRequest` (18+) / `VNRecognizeTextRequest` (13–17)
Android	API 24	ML Kit Text Recognition v2 (Latin)

A few things worth knowing before you start: iOS supports 13.0+ — recognition uses Apple Vision's modern Swift RecognizeTextRequest on iOS 18+ and falls back to the legacy VNRecognizeTextRequest on iOS 13–17 (the same engine, chosen automatically). Android recognizes Latin script only for now, and live scanning needs a real device — the iOS Simulator has no camera. The one-shot runs anywhere.

⚠️ iOS 13–16: the live preview and recognition don't follow device rotation. These versions predate AVCaptureDevice.RotationCoordinator, so live capture isn't rotated to match how the device is held (iOS 17+ is unaffected, and one-shot recognition is fine on every version — it reads the image's own orientation). It's a deliberate, low-maintenance trade-off for a device population we don't expect in practice; if it affects you, please open an issue and a proper rotation fallback will follow.

Install

flutter pub add text_sight

On iOS, add a camera-usage string to ios/Runner/Info.plist — this is required: iOS terminates the app the moment the camera is requested without it.

<key>NSCameraUsageDescription</key>
<string>Used to recognize text from the camera.</string>

Then let text_sight drive the runtime prompt: call controller.requestCameraPermission() (or controller.checkCameraPermission() to gate a priming screen) before controller.start(). It goes straight to the platform APIs — AVFoundation on iOS, the Android permission flow on Android — so no permission package is required. Already using permission_handler or similar? That still works. Android's manifest already has what it needs.

The recognition model

On iOS there's nothing to see here — recognition is Apple Vision, a system framework that's always on hand. No download, no waiting.

Android is the interesting one. The ML Kit model ships unbundled by default: it's a tiny ~260 KB and gets pulled from Google Play Services the first time you actually use it. We don't grab it at install time on purpose — most apps don't need OCR the second they launch, so there's no point making everyone pay for it up front. The one catch: a scan you kick off before the model has landed comes back empty.

So give it a nudge when the user wanders into your scanner:

final state = await TextSightModel.ensureReady();
if (state is ModelUnavailable) {
  // No Play Services, or the download didn't make it. Tell the user, maybe offer a retry.
}

Call it as often as you like — it returns right away once the model's around (which is always, on iOS). Want a progress bar in front of the user while it downloads? Listen to the readiness stream and switch over it. It's a sealed type, so the compiler makes sure you've handled every case:

TextSightModel.readiness.listen((state) {
  final label = switch (state) {
    ModelReady() => 'Ready to scan',
    ModelDownloading(:final progress) => 'Downloading… ${((progress ?? 0) * 100).round()}%',
    ModelUnavailable(:final reason) => 'Model unavailable ($reason)',
  };
  // ...show `label`, or feed `progress` straight into a progress indicator
});

The example/ live scanner does exactly this — ensureReady() to gate, the stream for a real download bar.

Or just bundle it

Don't fancy any of that? Ship the model inside your APK — instant, offline, Play Services out of the picture. One line in your app's android/gradle.properties:

com.lahaluhem.text_sight.useBundled=true

Now ensureReady() returns immediately and ModelUnavailable never shows up. You're trading size for it, mind:

Mode	App size	First use	Offline	Needs Play Services
Unbundled (default)	~260 KB	downloads on demand	after first download	yes
Bundled	~4 MB/script/arch	instant	yes	no

Performance

Recognition results cross from native to Dart as a small per-frame map over an EventChannel. Decoding it on the UI isolate costs microseconds — even a dense ~127-line frame is ~55 µs, well under 1% of a 60 fps budget. The native engine's inference, not the transport, sets the pace.

Per-frame decode cost vs frame size Encoded payload size vs frame size Decode cost per realistic OCR profile

These measure the pure-Dart codec only — not native inference or end-to-end latency, which dominate. Leaner transports (list, Pigeon, packed-binary) win big in percent but stay tiny in absolute µs, so the self-describing Map stays. Full methodology and numbers: benchmark/.

Going deeper

How it all fits together — coordinate handling, the per-line confidence contract, how region-of-interest differs across platforms, and what's next — lives in APPENDIX.md.