dart_mupdf library
dart_mupdf — A comprehensive pure Dart PDF library inspired by PyMuPDF.
Provides PDF parsing, text extraction, image extraction, annotations, metadata, table of contents, page manipulation, and PDF creation. Works on all platforms — no native dependencies.
Quick Start
import 'package:dart_mupdf_donut/dart_mupdf.dart';
final doc = DartMuPDF.openBytes(pdfBytes);
print('Pages: ${doc.pageCount}');
final text = doc.getPage(0).getText();
print(text);
doc.close();
Classes
- BartAttention
- Scaled dot-product attention used in BART decoder.
- BartDecoder
- Complete BART decoder for Donut.
- BartDecoderLayer
- A single BART decoder layer.
- Colorspace
-
PDF Colorspace, equivalent to PyMuPDF's
fitz.Colorspace. - Conv2d
- 2D Convolution layer.
- CrossRefEntry
- A single cross-reference entry.
- DartMuPDF
- Main entry point for the dart_mupdf library.
- Document
-
PDF Document class, equivalent to PyMuPDF's
fitz.Document. - DonutConfig
- Configuration class for the Donut model.
- DonutImageUtils
- Image preprocessing pipeline for Donut.
- DonutModel
- The complete Donut model combining Swin Transformer encoder and BART decoder.
- DonutResult
- Result of a Donut inference pass.
- DonutTokenizer
- Tokenizer for Donut model.
- DonutWeightLoader
- Loads pretrained weights into a Donut model.
- Dropout
- Dropout layer — identity in inference mode.
- EmbeddedFile
- An embedded file within a PDF, equivalent to PyMuPDF's embedded file methods.
- Embedding
- Lookup table embedding layer.
- ExtractedImage
- Extracted image data.
- FeedForward
- Position-wise feed-forward network: Linear → GELU → Linear
- GELU
- GELU activation function (Gaussian Error Linear Unit).
- IRect
-
An integer rectangle, equivalent to PyMuPDF's
fitz.IRect. - KVCache
- Key-value cache for auto-regressive decoding.
- LayerNorm
- Layer normalization over the last dimension.
- Linear
- Fully connected (linear) layer: y = x @ W^T + b
- LinkInfo
- Link information from a PDF page.
- Matrix
-
A 3x3 transformation matrix (stored as
a, b, c, d, e, f), equivalent to PyMuPDF'sfitz.Matrix. - MultiHeadAttention
- Scaled dot-product multi-head attention.
- OutlineItem
-
An outline (bookmark) item, equivalent to PyMuPDF's
Outlineclass. - Page
-
PDF page class, equivalent to PyMuPDF's
fitz.Page. - PageLabel
- A page label entry, equivalent to PyMuPDF's page label support.
- PatchEmbed
- Splits the input image into non-overlapping patches and projects them to an embedding space using a convolution.
- PatchMerging
- Downsamples the spatial resolution by 2x and doubles the channel dimension.
- PdfAnnotation
-
A PDF annotation, equivalent to PyMuPDF's
Annotclass. - PdfArray
- PDF array.
- PdfBool
- PDF boolean.
- PdfCrossRefTable
- Represents the cross-reference table of a PDF.
- PdfDict
- PDF dictionary.
- PdfEncryption
- PDF encryption handler.
- PdfImageInfo
- Information about an image found on a PDF page.
- PdfIndirectObject
- An indirect object (objectNumber generation obj ... endobj).
- PdfInt
- PDF integer.
- PdfMetadata
-
PDF document metadata, equivalent to PyMuPDF's
doc.metadata. - PdfName
- PDF name object (e.g., /Type, /Page).
- PdfNull
- PDF null object.
- PdfNumber
- PDF number (can be int or real).
- PdfObject
- Base class for all PDF objects.
- PdfParser
- Low-level PDF parser — reads PDF bytes and produces PDF objects.
- PdfReal
- PDF real number.
- PdfRef
- PDF indirect reference (e.g., "5 0 R").
- PdfStream
- A PDF stream (dictionary + binary data).
- PdfStreamCodec
- PDF stream decompression and compression utilities.
- PdfString
- PDF string (literal or hex).
- PdfWriter
- PDF writer — serializes PDF objects back to bytes.
- Pixmap
-
Image pixel map, equivalent to PyMuPDF's
fitz.Pixmap. - Point
-
A 2D point, equivalent to PyMuPDF's
fitz.Point. - Point2D
- Writing mode / direction helper.
- Quad
-
A quadrilateral defined by four corner points,
equivalent to PyMuPDF's
fitz.Quad. - Rect
-
A rectangle defined by two corner points, equivalent to PyMuPDF's
fitz.Rect. - ReLU
- ReLU activation function.
- Shape
-
Drawing helper class, equivalent to PyMuPDF's
Shape. - Softmax
- Softmax activation along a dimension.
- SwinEncoder
- Complete Swin Transformer encoder as used in Donut.
- SwinLayer
- A single Swin Transformer stage consisting of multiple blocks and an optional patch merging downsample layer.
- SwinTransformerBlock
- A single Swin Transformer block with window attention.
- Tensor
- An N-dimensional tensor backed by a flat Float32List.
- TextBlock
- A text block extracted from a PDF page.
- TextDict
-
Full text page dictionary, equivalent to PyMuPDF's
page.get_text("dict"). - TextDictBlock
- A block within a TextDict.
- TextDictChar
- A character within a TextDictSpan (for rawdict mode).
- TextDictLine
- A line within a TextDictBlock.
- TextDictSpan
- A span within a TextDictLine.
- TextPage
-
Represents a parsed text page, equivalent to PyMuPDF's
fitz.TextPage. - TextWord
- A single word extracted from a PDF page with position info.
- TocEntry
- A table of contents entry, equivalent to PyMuPDF's TOC list items.
- WeightExportGuide
- Utility for converting PyTorch/HuggingFace model weights to a format suitable for the Dart Donut model.
- WidgetInfo
-
A PDF form widget (form field), equivalent to PyMuPDF's
Widgetclass. - WindowAttention
- Window-based multi-head self attention (W-MSA / SW-MSA).
Enums
- AnnotationType
- Annotation types matching PyMuPDF constants.
- TextFormat
- Text output format options, matching PyMuPDF's format strings.
Functions
-
annotationTypeFromName(
String name) → AnnotationType - Map from PDF annotation subtype names to our enum.
-
md5Convert(
Uint8List data) → List< int> - Minimal MD5 implementation for PDF encryption.
-
parseCrossRefStream(
PdfDict dict, Uint8List decodedData) → PdfCrossRefTable - Parse a cross-reference stream.
-
parseCrossRefTable(
Uint8List data, int startOffset) → PdfCrossRefTable - Parse a traditional cross-reference table from bytes.