donut library
Donut — OCR-free Document Understanding Transformer.
Pure Dart implementation of the Donut model for end-to-end document understanding without OCR.
import 'package:dart_mupdf_donut/donut.dart';
See the DonutModel class for usage examples.
Classes
- BartAttention
- Scaled dot-product attention used in BART decoder.
- BartDecoder
- Complete BART decoder for Donut.
- BartDecoderLayer
- A single BART decoder layer.
- Conv2d
- 2D Convolution layer.
- DonutConfig
- Configuration class for the Donut model.
- DonutImageUtils
- Image preprocessing pipeline for Donut.
- DonutModel
- The complete Donut model combining Swin Transformer encoder and BART decoder.
- DonutResult
- Result of a Donut inference pass.
- DonutTokenizer
- Tokenizer for Donut model.
- DonutWeightLoader
- Loads pretrained weights into a Donut model.
- Dropout
- Dropout layer — identity in inference mode.
- Embedding
- Lookup table embedding layer.
- FeedForward
- Position-wise feed-forward network: Linear → GELU → Linear
- GELU
- GELU activation function (Gaussian Error Linear Unit).
- KVCache
- Key-value cache for auto-regressive decoding.
- LayerNorm
- Layer normalization over the last dimension.
- Linear
- Fully connected (linear) layer: y = x @ W^T + b
- MultiHeadAttention
- Scaled dot-product multi-head attention.
- PatchEmbed
- Splits the input image into non-overlapping patches and projects them to an embedding space using a convolution.
- PatchMerging
- Downsamples the spatial resolution by 2x and doubles the channel dimension.
- ReLU
- ReLU activation function.
- Softmax
- Softmax activation along a dimension.
- SwinEncoder
- Complete Swin Transformer encoder as used in Donut.
- SwinLayer
- A single Swin Transformer stage consisting of multiple blocks and an optional patch merging downsample layer.
- SwinTransformerBlock
- A single Swin Transformer block with window attention.
- Tensor
- An N-dimensional tensor backed by a flat Float32List.
- WeightExportGuide
- Utility for converting PyTorch/HuggingFace model weights to a format suitable for the Dart Donut model.
- WindowAttention
- Window-based multi-head self attention (W-MSA / SW-MSA).