DonutConfig class

Configuration class for the Donut model.

This stores the architecture hyperparameters for both the Swin Transformer encoder and BART decoder.

Default values match the donut-base pretrained model:

  • Input size: 2560x1920 (height x width)
  • Encoder: Swin-B with layers 2, 2, 14, 2
  • Decoder: 4-layer BART with 1024-dim embeddings

Constructors

DonutConfig({List<int> inputSize = const [2560, 1920], bool alignLongAxis = false, int windowSize = 10, List<int> encoderLayer = const [2, 2, 14, 2], int decoderLayer = 4, int maxPositionEmbeddings = 1536, int maxLength = 1536, int encoderEmbedDim = 128, List<int> encoderNumHeads = const [4, 8, 16, 32], int patchSize = 4, int decoderEmbedDim = 1024, int decoderFfnDim = 4096, int decoderNumHeads = 16, int vocabSize = 57522, String nameOrPath = ''})
const
DonutConfig.base()
Configuration for the donut-base pretrained model.
factory
DonutConfig.fromJson(Map<String, dynamic> json)
Create from JSON map (for loading from config.json).
factory
DonutConfig.proto()
Configuration for the donut-proto (smaller) model.
factory
DonutConfig.small()
Configuration for a small model (for testing/development).
factory

Properties

alignLongAxis bool
Whether to rotate image if height > width.
final
decoderEmbedDim int
Embedding dimension for the decoder.
final
decoderFfnDim int
FFN dimension for the decoder.
final
decoderLayer int
Number of BART decoder layers.
final
decoderNumHeads int
Number of attention heads for the decoder.
final
encoderEmbedDim int
Embedding dimension for the encoder.
final
encoderLayer List<int>
Depth of each Swin Transformer stage.
final
encoderNumHeads List<int>
Number of attention heads per encoder stage.
final
encoderOutputDim int
Compute the encoder's output dimension.
no setter
hashCode int
The hash code for this object.
no setterinherited
inputSize List<int>
Input image size as height, width.
final
maxLength int
Maximum sequence length for generation.
final
maxPositionEmbeddings int
Maximum position embeddings for decoder.
final
nameOrPath String
Path or name of pretrained model.
final
patchSize int
Patch size for the visual encoder.
final
runtimeType Type
A representation of the runtime type of the object.
no setterinherited
vocabSize int
Vocabulary size.
final
windowSize int
Window size for Swin Transformer.
final

Methods

noSuchMethod(Invocation invocation) → dynamic
Invoked when a nonexistent method or property is accessed.
inherited
toJson() Map<String, dynamic>
Convert to JSON map for serialization.
toString() String
A string representation of this object.
override

Operators

operator ==(Object other) bool
The equality operator.
inherited

Constants

imagenetMean → const List<double>
ImageNet normalization mean.
imagenetStd → const List<double>
ImageNet normalization std.