dart_mupdf_donut
A comprehensive pure Dart PDF library + OCR-free Document Understanding Transformer (Donut).
No native dependencies — works on all platforms (Android, iOS, Web, macOS, Windows, Linux).

Why This Package?
Most Dart/Flutter PDF libraries either need native C bindings (FFI) or can only
display PDFs. dart_mupdf_donut is different:
- 100 % pure Dart — no C code, no platform channels, no FFI. Runs on
every platform Flutter supports, including Web.
- Full PDF engine — not just a viewer. Parse, edit, create, merge, and
save PDFs entirely in Dart, inspired by the battle-tested
PyMuPDF API.
- Page rendering —
Page.getPixmap() renders a page to an RGB/RGBA pixel
buffer that you can save as PNG, JPEG, or feed to any image widget.
- Reactive state — built-in
DocumentController powered by
RxDart gives you observable streams for
page navigation, zoom, loading state, and errors.
- OCR-free AI — the Donut module runs a Swin + BART transformer entirely
in Dart to extract structured JSON from receipt/invoice images — no cloud
API, no OCR, no native ML runtime.
What's Inside
| Module |
Description |
| dart_mupdf |
Pure Dart PDF engine inspired by PyMuPDF — parse, extract text/images, render pages, annotate, merge, create PDFs |
| Reactive Layer |
DocumentController + DocumentState — RxDart-powered state management for page navigation, zoom, rendering, and text extraction |
| Batch Utilities |
BatchRenderer — thumbnail generation, full-text search across pages, bulk PNG export |
| donut |
Pure Dart Donut implementation — Swin Transformer encoder + mBART decoder for structured document extraction from images (receipts, invoices, forms) |
Table of Contents
Installation
dependencies:
dart_mupdf_donut: ^0.1.2
dart pub get
Quick Start — PDF
import 'dart:io';
import 'package:dart_mupdf_donut/dart_mupdf.dart';
void main() {
// ── Open a PDF ──────────────────────────────────────────────
final bytes = File('invoice.pdf').readAsBytesSync();
final doc = DartMuPDF.openBytes(bytes);
print('Pages : ${doc.pageCount}');
print('Title : ${doc.metadata.title}');
// ── Extract text ────────────────────────────────────────────
final page = doc.getPage(0);
print(page.getText());
// ── Render to image ─────────────────────────────────────────
final pix = page.getPixmap(); // ← NEW in 0.1.2
File('page0.png').writeAsBytesSync(pix.toPng());
// ── Search ──────────────────────────────────────────────────
for (final rect in page.searchFor('Total')) {
print('Found "Total" at $rect');
}
// ── Extract images ──────────────────────────────────────────
for (final info in page.getImages()) {
final img = doc.extractImage(info.xref);
if (img != null) print('Image ${info.xref}: ${img.width}×${img.height}');
}
// ── Table of contents ───────────────────────────────────────
for (final entry in doc.getToc()) {
print('${" " * (entry.level - 1)}${entry.title} → p.${entry.pageNumber}');
}
doc.close();
}
Rendering Pages (getPixmap)
This was the #1 requested feature. In v1.0.0 the Pixmap class existed
but there was no way to get one from a Page. As of 0.1.2 the method is
fully available.
final page = doc.getPage(0);
// Default render (72 DPI, RGB, no alpha)
final pix = page.getPixmap();
File('page.png').writeAsBytesSync(pix.toPng());
// 2× zoom (144 DPI equivalent)
final hires = page.getPixmap(matrix: Matrix.scale(2, 2));
// Grayscale
final gray = page.getPixmap(colorspace: Colorspace.csGray);
// With transparency
final rgba = page.getPixmap(alpha: true);
// Convenience helpers
File('page.png').writeAsBytesSync(page.renderToPng());
File('page.jpg').writeAsBytesSync(page.renderToJpeg(quality: 85));
Pixmap Operations
final pix = page.getPixmap();
// Crop a region
final cropped = pix.crop(IRect(50, 50, 400, 300));
// Scale (bilinear interpolation)
final thumb = pix.scale(150, 200);
// Convert colorspace
final cmyk = pix.toColorspace(Colorspace.csCmyk);
// Save directly to file
pix.save('output.png'); // format inferred from extension
pix.save('output.jpg');
pix.save('output.bmp');
// Manipulate pixels
pix.invertIRect();
pix.gammaWith(1.5);
pix.tintWith(0, 200);
Reactive State Management (RxDart)
DocumentController wraps the Document in an observable, immutable state
stream. Perfect for Flutter StreamBuilder / BLoC / Riverpod integration.
import 'package:dart_mupdf_donut/dart_mupdf.dart';
final ctrl = DocumentController();
// Subscribe to state changes
ctrl.state$.listen((s) {
print('Page ${s.currentPage + 1} / ${s.pageCount} '
'zoom: ${(s.zoom * 100).toInt()}%');
});
// Open
ctrl.openBytes(pdfBytes);
// Navigate
ctrl.nextPage();
ctrl.previousPage();
ctrl.goToPage(5);
ctrl.firstPage();
ctrl.lastPage();
// Zoom
ctrl.setZoom(2.0); // 200 %
ctrl.zoomIn(); // +25 %
ctrl.zoomOut(0.5); // −50 %
// Render current page (internally LRU-cached)
final pix = ctrl.renderPage();
final png = ctrl.renderToPng();
// Extract text
final text = ctrl.getText(); // current page
final fullText = ctrl.getFullText(); // all pages
// Fine-grained streams (distinct, no duplicates)
ctrl.currentPage$.listen((p) => print('Page changed: $p'));
ctrl.zoom$.listen((z) => print('Zoom changed: $z'));
ctrl.isLoading$.listen((l) => print('Loading: $l'));
ctrl.errors$.listen((e) => print('Error: $e'));
// Cleanup
ctrl.dispose();
Flutter Example (StreamBuilder)
StreamBuilder<DocumentState>(
stream: ctrl.state$,
builder: (context, snap) {
final s = snap.data;
if (s == null || !s.isOpen) return CircularProgressIndicator();
final png = ctrl.renderToPng();
return Column(children: [
Text('Page ${s.currentPage + 1} / ${s.pageCount}'),
Image.memory(png),
Row(children: [
IconButton(icon: Icon(Icons.arrow_back), onPressed: ctrl.previousPage),
IconButton(icon: Icon(Icons.arrow_forward), onPressed: ctrl.nextPage),
IconButton(icon: Icon(Icons.zoom_in), onPressed: ctrl.zoomIn),
IconButton(icon: Icon(Icons.zoom_out), onPressed: ctrl.zoomOut),
]),
]);
},
);
Batch Operations
For processing many pages at once (thumbnails, search, export):
import 'package:dart_mupdf_donut/dart_mupdf.dart';
final doc = DartMuPDF.openFile('report.pdf');
final batch = BatchRenderer(doc);
// Generate thumbnails (max 150 px)
final thumbs = batch.renderThumbnails(maxDimension: 150);
thumbs.forEach((i, pix) => pix.save('thumb_$i.png'));
// Full-text search across all pages
final hits = batch.searchAll('revenue');
hits.forEach((page, rects) {
print('Page $page: ${rects.length} matches');
});
// Extract all text
final texts = batch.extractAllText();
File('fulltext.txt').writeAsStringSync(texts.join('\n\n'));
// Export all pages to PNG
final pngs = batch.exportAllToPng(matrix: Matrix.scale(2, 2));
for (var i = 0; i < pngs.length; i++) {
File('page_$i.png').writeAsBytesSync(pngs[i]);
}
doc.close();
PDF Features Comparison
| Feature |
PyMuPDF (Python) |
dart_mupdf_donut (Dart) |
| Open PDF from file/bytes |
✅ fitz.open() |
✅ DartMuPDF.openFile() / openBytes() |
| Page count & metadata |
✅ doc.page_count |
✅ doc.pageCount / doc.metadata |
| Render page → Pixmap |
✅ page.get_pixmap() |
✅ page.getPixmap() (new in 0.1.2) |
| Extract plain text |
✅ page.get_text() |
✅ page.getText() |
| Extract text blocks |
✅ page.get_text("blocks") |
✅ page.getTextBlocks() |
| Extract text words |
✅ page.get_text("words") |
✅ page.getTextWords() |
| Extract text as dict |
✅ page.get_text("dict") |
✅ page.getTextDict() |
| Search text |
✅ page.search_for() |
✅ page.searchFor() |
| Get images list |
✅ page.get_images() |
✅ page.getImages() |
| Extract image bytes |
✅ doc.extract_image() |
✅ doc.extractImage() |
| Get links |
✅ page.get_links() |
✅ page.getLinks() |
| Table of contents |
✅ doc.get_toc() |
✅ doc.getToc() |
| Annotations |
✅ page.annots() |
✅ page.getAnnotations() |
| Insert text |
✅ page.insert_text() |
✅ page.insertText() |
| Insert image |
✅ page.insert_image() |
✅ page.insertImage() |
| Merge PDFs |
✅ doc.insert_pdf() |
✅ doc.insertPdf() |
| Delete / rotate / copy pages |
✅ |
✅ |
| Create new PDF |
✅ fitz.open() |
✅ DartMuPDF.createPdf() |
| Save to bytes / file |
✅ doc.tobytes() |
✅ doc.toBytes() / doc.save() |
| Pixmap crop / scale |
✅ |
✅ pix.crop() / pix.scale() (new) |
| Pixmap save to file |
✅ |
✅ pix.save() (fixed) |
| Reactive state mgmt |
❌ |
✅ DocumentController (new) |
| Batch rendering |
❌ |
✅ BatchRenderer (new) |
| Encryption & auth |
✅ |
✅ doc.isEncrypted / doc.authenticate() |
| Embedded files |
✅ doc.embfile_* |
✅ doc.embeddedFiles |
| Form fields |
✅ page.widgets() |
✅ page.getWidgets() |
| Page labels |
✅ |
✅ doc.getPageLabels() |
PDF Advanced Usage
final text = page.getText(); // plain text
final blocks = page.getTextBlocks(); // blocks with position
final words = page.getTextWords(); // individual words
final dict = page.getTextDict(); // full structure
final html = page.getText(format: TextFormat.html);
Page Manipulation
doc.deletePage(2);
doc.getPage(0).setRotation(90);
doc.movePage(from: 5, to: 0);
doc.select([0, 2, 4]);
doc.copyPage(0, to: doc.pageCount);
Annotations
for (final a in page.getAnnotations()) {
print('${a.type}: ${a.content}');
}
page.addHighlightAnnot(quads);
page.addTextAnnot(Point(100, 100), 'Note');
Drawing with Shape
final shape = Shape(pageWidth: 595, pageHeight: 842);
shape.drawRect(Rect(50, 50, 300, 200));
shape.finish(color: [1, 0, 0], fill: [0.9, 0.9, 1.0], width: 1);
shape.drawCircle(Point(200, 400), 50);
shape.finish(color: [0, 0, 1], width: 1.5);
final stream = shape.commit();
Quick Start — Donut
import 'package:dart_mupdf_donut/donut.dart';
import 'dart:io';
// 1. Configure & build model
final config = DonutConfig.base();
final model = DonutModel(config);
// 2. Load pretrained weights (from HuggingFace export)
await model.loadWeights('path/to/donut-model/');
model.loadTokenizerFromFile('path/to/tokenizer.json');
// 3. Run inference on a receipt image
final bytes = File('receipt.jpg').readAsBytesSync();
final result = model.inferenceFromBytes(
imageBytes: bytes,
prompt: '<s_cord-v2>',
);
// 4. Structured JSON output
print(result.json);
// {
// "menu": [
// {"nm": "Cappuccino", "price": "4.50"},
// {"nm": "Croissant", "price": "3.00"}
// ],
// "total": {"total_price": "7.50"}
// }
Combine PDF + Donut
import 'package:dart_mupdf_donut/dart_mupdf.dart';
import 'package:dart_mupdf_donut/donut.dart';
final doc = DartMuPDF.openBytes(pdfBytes);
final page = doc.getPage(0);
// Render the page to an image, then feed it to Donut
final png = page.getPixmap(matrix: Matrix.scale(2, 2)).toPng();
final model = DonutModel(DonutConfig.base());
await model.loadWeights('model/');
model.loadTokenizerFromFile('tokenizer.json');
final result = model.inferenceFromBytes(
imageBytes: png,
prompt: '<s_cord-v2>',
);
print(result.json);
doc.close();
Donut Architecture
Document Image
│
▼
┌─────────────────────────┐
│ Swin Encoder │ Hierarchical vision transformer
│ Patch Embed → Stages │ Window attention + patch merging
│ [2, 2, 14, 2] layers │ Output: (1, N, 1024) features
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ BART Decoder │ Auto-regressive text decoder
│ Cross-attention to │ 4 layers × 16 heads
│ encoder features │ Generates structured tokens
└───────────┬─────────────┘
│
▼
Structured JSON Output
Key insight: Donut skips OCR entirely — it learns to read documents end-to-end from pixels to structured data.
Donut Supported Tasks
| Task |
Prompt |
Output |
| Receipt Parsing (CORD-v2) |
<s_cord-v2> |
Menu items, prices, totals |
| Document Classification (RVL-CDIP) |
<s_rvlcdip> |
"letter", "invoice", etc. |
| Visual QA (DocVQA) |
<s_docvqa><s_question>…</s_question><s_answer> |
Free-text answer |
| Text Reading (SynthDoG) |
<s_synthdog> |
OCR-free text |
Donut Configuration
// Default (matches HuggingFace donut-base)
final config = DonutConfig.base();
// Custom
final config = DonutConfig(
inputSize: [1280, 960],
windowSize: 10,
encoderLayer: [2, 2, 14, 2],
decoderLayer: 4,
maxLength: 1536,
encoderEmbedDim: 128,
encoderNumHeads: [4, 8, 16, 32],
decoderEmbedDim: 1024,
decoderFfnDim: 4096,
decoderNumHeads: 16,
vocabSize: 57522,
);
// Small (for testing / dev)
final config = DonutConfig.small();
// From HuggingFace config.json
final config = DonutConfig.fromJson(jsonDecode(str));
Donut Image Preprocessing
// From file bytes (PNG, JPEG, etc.)
final tensor = DonutImageUtils.preprocessBytes(imageBytes, config);
// → Tensor shape [1, 3, H, W] with ImageNet normalization
// Pipeline description
print(DonutImageUtils.describePipeline(config));
// 1. Decode → RGB
// 2. Resize to fit [H, W] (aspect ratio preserved)
// 3. Pad with white
// 4. Normalize: (px/255 − μ) / σ (ImageNet stats)
// 5. → Tensor [1, 3, H, W]
// Debug: tensor back to image
final debugImg = DonutImageUtils.tensorToImage(tensor);
Donut JSON ↔ Token Conversion
Donut uses XML-like tokens for structured output:
// JSON → Donut tokens
final tokens = DonutModel.json2token({
'menu': [
{'nm': 'Latte', 'price': '5.0'},
{'nm': 'Muffin', 'price': '3.5'},
],
'total': {'total_price': '8.5'},
});
// → '<s_menu><s_nm>Latte</s_nm><s_price>5.0</s_price><sep/>
// <s_nm>Muffin</s_nm><s_price>3.5</s_price></s_menu>
// <s_total><s_total_price>8.5</s_total_price></s_total>'
// Donut tokens → JSON
final json = DonutModel.token2json(tokens);
// → {'menu': [...], 'total': {'total_price': '8.5'}}
Tensor Operations
import 'package:dart_mupdf_donut/donut.dart';
final a = Tensor.zeros([2, 3]);
final b = Tensor.ones([2, 3]);
final c = a + b; // element-wise add
final d = c * Tensor.full([2, 3], 2.0);
// Matrix multiply
final y = Tensor.ones([2, 4]).matmul(Tensor.ones([4, 3]));
// Reshape, permute, transpose
final r = y.reshape([1, 2, 3]);
final p = r.permute([0, 2, 1]);
// Reductions
final s = y.sum(1);
final m = y.mean(0);
// Activations
final g = y.gelu();
final sm = y.softmax(1);
Neural Network Layers
final linear = Linear(512, 256);
final output = linear.forward(input);
final norm = LayerNorm(256);
final normalized = norm.forward(output);
final embed = Embedding(50000, 1024);
final embedded = embed.forward([1, 42, 100]);
final attn = MultiHeadAttention(embedDim: 1024, numHeads: 16);
final attended = attn.forward(query, key, value);
final conv = Conv2d(3, 64, 4, stride: 4);
final features = conv.forward(imageTensor);
Exporting Weights from Python
Safetensors (recommended)
pip install huggingface_hub
python -c "
from huggingface_hub import snapshot_download
snapshot_download('naver-clova-ix/donut-base-finetuned-cord-v2',
local_dir='./donut-cord-v2/')
"
JSON (portable)
import torch, json, base64
from transformers import VisionEncoderDecoderModel
model = VisionEncoderDecoderModel.from_pretrained(
"naver-clova-ix/donut-base-finetuned-cord-v2"
)
weights = {}
for name, param in model.named_parameters():
t = param.detach().cpu().float().numpy()
weights[name] = {
"shape": list(t.shape),
"dtype": "float32",
"data": base64.b64encode(t.tobytes()).decode("ascii"),
}
with open("weights.json", "w") as f:
json.dump(weights, f)
API Reference
PDF Core
| Class |
Description |
DartMuPDF |
Entry point — openFile(), openBytes(), createPdf() |
Document |
PDF document — pages, metadata, TOC, merge, save |
Page |
Single page — text, images, links, annotations, getPixmap() |
Pixmap |
Pixel buffer — crop, scale, convert, export PNG/JPEG/BMP |
Shape |
Drawing API — lines, rects, circles, text |
TextPage |
Parsed text layer — blocks, words, search |
Reactive & Utilities (new in 0.1.2)
| Class |
Description |
DocumentController |
RxDart-powered state management — navigation, zoom, render cache |
DocumentState |
Immutable state snapshot — currentPage, pageCount, zoom, metadata, toc |
BatchRenderer |
Bulk operations — thumbnails, full-text, search-all, export-all |
Donut Core
| Class |
Description |
DonutModel |
Main model — encoder + decoder + inference |
DonutConfig |
All hyperparameters (input size, layers, dims) |
DonutResult |
Output — .tokens, .text, .json |
DonutTokenizer |
SentencePiece BPE tokenizer |
DonutImageUtils |
Resize, normalize, pad images |
DonutWeightLoader |
Load safetensors / JSON weights |
Donut Encoder / Decoder
| Class |
Description |
SwinEncoder |
Swin Transformer visual encoder |
BartDecoder |
mBART auto-regressive decoder |
SwinTransformerBlock |
Window attention block |
BartDecoderLayer |
Self-attn → cross-attn → FFN |
WindowAttention |
Shifted window attention |
PatchEmbed |
Conv2d patch embedding |
PatchMerging |
2× spatial downsample |
Tensor & NN
| Class |
Description |
Tensor |
N-dim array — matmul, softmax, GELU, broadcasting |
Linear |
y = xW^T + b |
LayerNorm |
Layer normalization |
Embedding |
Token / position lookup |
Conv2d |
2D convolution |
MultiHeadAttention |
Scaled dot-product attention |
FeedForward |
2-layer MLP + GELU |
Geometry
| Class |
Description |
Point |
2D point |
Rect |
Bounding rectangle |
IRect |
Integer rectangle |
Matrix |
3×3 transformation matrix |
Quad |
Quadrilateral (4 points) |
Compatible Pretrained Models
| Model |
Task |
HuggingFace ID |
| Donut Base |
General |
naver-clova-ix/donut-base |
| CORD v2 |
Receipt parsing |
naver-clova-ix/donut-base-finetuned-cord-v2 |
| RVL-CDIP |
Doc classification |
naver-clova-ix/donut-base-finetuned-rvlcdip |
| DocVQA |
Visual QA |
naver-clova-ix/donut-base-finetuned-docvqa |
| Platform |
Status |
| Android |
✅ |
| iOS |
✅ |
| Web |
✅ |
| macOS |
✅ |
| Windows |
✅ |
| Linux |
✅ |
Migration from 1.0.0 → 0.1.2
Version 0.1.2 is fully backward-compatible. No breaking changes — only
additions:
| What changed |
Details |
Page.getPixmap() added |
Renders a page to Pixmap. Was not available in 1.0.0. |
Page.renderToPng() / renderToJpeg() added |
Convenience wrappers. |
Pixmap.crop() / scale() added |
Sub-region extraction and bilinear resize. |
Pixmap.save() fixed |
Was a no-op stub; now writes to disk. |
DocumentController added |
RxDart-based reactive state management. |
BatchRenderer added |
Bulk page operations. |
rxdart dependency added |
Required by DocumentController. |
License
MIT — see LICENSE for details.
References
- PyMuPDF: github.com/pymupdf/PyMuPDF
- flutter_mupdf: pub.dev/packages/flutter_mupdf — native MuPDF wrapper (inspiration for
getPixmap API)
- Donut: Kim et al., "OCR-free Document Understanding Transformer", ECCV 2022 — arXiv:2111.15664 | Code
- Swin Transformer: Liu et al., "Hierarchical Vision Transformer using Shifted Windows", ICCV 2021
- BART: Lewis et al., "Denoising Sequence-to-Sequence Pre-training", ACL 2020