pdf_ocr_ondevice
On-device, downloadable OCR for dart_pdf_editor.
Adds a selectable, searchable, invisible text layer over scanned (image-only) PDF pages. It runs entirely on the device, with no per-page network call. A small OCR model (PaddleOCR PP-OCRv5 mobile, ~21 MB) downloads once, is cached under the app-support directory, and then runs locally on ONNX Runtime.
It implements dart_pdf_editor's PdfOcrEngine, so the recognized text is
written by PdfEditor.applyOcr exactly like any other engine. The page looks
unchanged, but its text becomes selectable, searchable, copyable, and
extractable.
Where this fits
dart-pdf has two OCR engines, two tiers:
| Engine | Where it runs | Best for |
|---|---|---|
pdf_ocr_vlm |
A server/cloud you call over HTTP (dots.ocr on vLLM, or any VLM) | Highest accuracy and layout/table parsing when a GPU server or an API is available |
pdf_ocr_ondevice (this) |
The device, offline | Privacy, offline use, and no infrastructure; a plain selectable text layer on every native platform |
The SOTA document-parsing models (dots.ocr 1.7B, PaddleOCR-VL 0.9B) are billion-parameter VLMs that realistically need a GPU; this package uses the small classic detect → recognize PP-OCR pipeline (~5M parameters) so it runs on CPU on a phone or a laptop.
Supported platforms
Android, iOS, macOS, Windows, and Linux, wherever ONNX Runtime has prebuilt
binaries. Not the web (no local model store / native runtime): on the web,
PdfOcrModelManager.isSupported is false; use pdf_ocr_vlm against an HTTP
service there.
Install
flutter pub add dart_pdf_editor pdf_ocr_ondevice
No model files need to be bundled in your app. The default bundle downloads
from the ocr-models-v1 GitHub release on first use, verifies each file by
SHA-256, and then runs from the device cache.
Usage
import 'dart:typed_data';
import 'package:dart_pdf_editor/dart_pdf_editor.dart';
import 'package:pdf_document/pdf_document.dart';
import 'package:pdf_ocr_ondevice/pdf_ocr_ondevice.dart';
Future<Uint8List> addSearchableTextLayer(Uint8List bytes) async {
if (!PdfOcrModelManager.isSupported) return bytes; // web/fuchsia fallback
final manager = PdfOcrModelManager();
final model = PdfOcrModels.ppOcrV5Mobile;
OnDeviceOcrEngine? engine;
try {
// 1. Download the model once (cached afterwards).
if (!await manager.isDownloaded(model)) {
await manager.download(model, onProgress: (p) {
final pct = ((p.fraction ?? 0) * 100).round();
print('Downloading ${p.fileName}: $pct%');
});
}
// 2. Build an engine from the downloaded files and run it over each page.
engine = await OnDeviceOcrEngine.fromDownloadedModel(manager, model);
final editor = PdfEditor(PdfDocument.open(bytes));
for (var page = 0; page < editor.document.pageCount; page++) {
await editor.applyOcr(page, engine, pixelRatio: 2);
}
return editor.save(); // selectable/searchable text layer added
} finally {
await engine?.dispose();
manager.close();
}
}
Open the returned bytes, or replace the bytes in PdfReader /
PdfEditorView, after the function returns. Long documents should run this
from your app flow with progress and cancellation; the DartPDF app's
app/lib/ocr_native.dart is the reference orchestration.
The default model bundle
PdfOcrModels.ppOcrV5Mobile downloads its files from the
ocr-models-v1
GitHub release. There is nothing to host, and it works out of the box. Each file's
sha256 is pinned in the descriptor, so a corrupted or tampered download is
rejected.
The bundle is the official PaddleOCR PP-OCRv5 mobile detection +
recognition models converted to ONNX with
paddle2onnx, plus the
recognizer's character dictionary. See Model license & attribution below.
Rolling your own bundle
To host elsewhere (or ship a different model), reproduce the conversion and
supply your own PdfOcrModel:
-
Download PP-OCRv5 mobile detection + recognition inference models from PaddleOCR (and the matching
ppocrv5_dict.txt). -
Convert each to ONNX with
paddle2onnx:paddle2onnx --model_dir PP-OCRv5_mobile_det \ --model_filename inference.json --params_filename inference.pdiparams \ --save_file PP-OCRv5_mobile_det.onnx paddle2onnx --model_dir PP-OCRv5_mobile_rec \ --model_filename inference.json --params_filename inference.pdiparams \ --save_file PP-OCRv5_mobile_rec.onnx -
Upload the two
.onnxfiles andppocrv5_dict.txtas release assets (or anywhere reachable) and point a customPdfOcrModelat them. -
Set each file's
sha256in your descriptor so downloads are integrity checked.
Model license & attribution
The default bundle is a derivative work of PaddleOCR PP-OCRv5 mobile
(Copyright © PaddlePaddle Authors), redistributed under the Apache License
2.0, the same license as this package. The .onnx files are the official
PaddlePaddle inference models converted to ONNX with paddle2onnx (opset 14;
no weights retrained or altered); ppocrv5_dict.txt is the recognizer's
character dictionary extracted verbatim from the official config. The
ocr-models-v1
release carries the full LICENSE.txt + NOTICE.txt.
Sources: PP-OCRv5_mobile_det · PP-OCRv5_mobile_rec · PaddleOCR.
A custom model / hosting
final model = PdfOcrModel(
id: 'my-ocr-en',
displayName: 'My OCR',
detection: PdfOcrModelFile(
name: 'det.onnx',
url: Uri.parse('https://example.com/det.onnx'),
sha256: 'a1b2…',
),
recognition: PdfOcrModelFile(
name: 'rec.onnx',
url: Uri.parse('https://example.com/rec.onnx'),
sha256: 'c3d4…',
),
dictionary: PdfOcrModelFile(
name: 'dict.txt',
url: Uri.parse('https://example.com/dict.txt'),
),
);
How it works
OnDeviceOcrEngine reads the page raster into an OcrImage, runs an
OcrModelRunner, and maps each recognized line's pixel box to PDF user space
via PdfOcrPageImage.userSpaceRect. The default OnnxOcrModelRunner:
- resizes the page for detection (longest side ≤ limit, multiples of 32) and
normalizes it (
toNchwFloat32); - runs the detection network → a probability map, from which
extractDetectionBoxesderives text-line boxes (DB threshold + connected components + unclip), scaled back to the original raster; - crops each box, normalizes it for recognition (
recognitionInput), runs the recognition network, and greedily CTC-decodes (CtcDecoder) the logits against the model's dictionary.
Everything except the two OrtSession.run calls is plain Dart and unit tested.
Custom backend
OnDeviceOcrEngine takes any OcrModelRunner, so a platform-native recognizer
(Apple Vision, ML Kit, Windows.Media.Ocr) can stand in while reusing the
download lifecycle and the page-geometry mapping. Return RecognizedTextLines
in raster pixels and the engine does the rest.
Native setup
ONNX Runtime is pulled in by the onnxruntime package; follow its platform
notes (it bundles the runtime for mobile/desktop). No extra steps are needed
for the Dart API.
For web apps, use pdf_ocr_vlm or your own browser/JavaScript
PdfOcrEngine. The product app demonstrates a browser-local bridge in
app/lib/ocr_web.dart, but this package intentionally stays native because
ONNX Runtime is FFI-backed.
Libraries
- pdf_ocr_ondevice
- On-device, downloadable OCR for
dart_pdf_editor.