DonutConfig class

Configuration class for the Donut model.

This stores the architecture hyperparameters for both the Swin Transformer encoder and BART decoder.

Default values match the donut-base pretrained model:

Constructors

DonutConfig({List<int> inputSize = const [2560, 1920], bool alignLongAxis = false, int windowSize = 10, List<int> encoderLayer = const [2, 2, 14, 2], int decoderLayer = 4, int maxPositionEmbeddings = 1536, int maxLength = 1536, int encoderEmbedDim = 128, List<int> encoderNumHeads = const [4, 8, 16, 32], int patchSize = 4, int decoderEmbedDim = 1024, int decoderFfnDim = 4096, int decoderNumHeads = 16, int vocabSize = 57522, String nameOrPath = ''}): const
DonutConfig.base(): Configuration for the donut-base pretrained model.
factory
DonutConfig.fromJson(Map<String, dynamic> json): Create from JSON map (for loading from config.json).
factory
DonutConfig.proto(): Configuration for the donut-proto (smaller) model.
factory
DonutConfig.small(): Configuration for a small model (for testing/development).
factory

alignLongAxis → bool: Whether to rotate image if height > width.
final
decoderEmbedDim → int: Embedding dimension for the decoder.
final
decoderFfnDim → int: FFN dimension for the decoder.
final
decoderLayer → int: Number of BART decoder layers.
final
decoderNumHeads → int: Number of attention heads for the decoder.
final
encoderEmbedDim → int: Embedding dimension for the encoder.
final
encoderLayer → List<int>: Depth of each Swin Transformer stage.
final
encoderNumHeads → List<int>: Number of attention heads per encoder stage.
final
encoderOutputDim → int: Compute the encoder's output dimension.
no setter
hashCode → int: The hash code for this object.
no setterinherited
inputSize → List<int>: Input image size as height, width.
final
maxLength → int: Maximum sequence length for generation.
final
maxPositionEmbeddings → int: Maximum position embeddings for decoder.
final
nameOrPath → String: Path or name of pretrained model.
final
patchSize → int: Patch size for the visual encoder.
final
runtimeType → Type: A representation of the runtime type of the object.
no setterinherited
vocabSize → int: Vocabulary size.
final
windowSize → int: Window size for Swin Transformer.
final

noSuchMethod(Invocation invocation) → dynamic: Invoked when a nonexistent method or property is accessed.
inherited
toJson() → Map<String, dynamic>: Convert to JSON map for serialization.
toString() → String: A string representation of this object.
override