DonutConfig class
Configuration class for the Donut model.
This stores the architecture hyperparameters for both the Swin Transformer encoder and BART decoder.
Default values match the donut-base pretrained model:
- Input size: 2560x1920 (height x width)
- Encoder: Swin-B with layers
2, 2, 14, 2 - Decoder: 4-layer BART with 1024-dim embeddings
Constructors
-
DonutConfig({List<
int> inputSize = const [2560, 1920], bool alignLongAxis = false, int windowSize = 10, List<int> encoderLayer = const [2, 2, 14, 2], int decoderLayer = 4, int maxPositionEmbeddings = 1536, int maxLength = 1536, int encoderEmbedDim = 128, List<int> encoderNumHeads = const [4, 8, 16, 32], int patchSize = 4, int decoderEmbedDim = 1024, int decoderFfnDim = 4096, int decoderNumHeads = 16, int vocabSize = 57522, String nameOrPath = ''}) -
const
- DonutConfig.base()
-
Configuration for the donut-base pretrained model.
factory
-
DonutConfig.fromJson(Map<
String, dynamic> json) -
Create from JSON map (for loading from config.json).
factory
- DonutConfig.proto()
-
Configuration for the donut-proto (smaller) model.
factory
- DonutConfig.small()
-
Configuration for a small model (for testing/development).
factory
Properties
- alignLongAxis → bool
-
Whether to rotate image if height > width.
final
- decoderEmbedDim → int
-
Embedding dimension for the decoder.
final
- decoderFfnDim → int
-
FFN dimension for the decoder.
final
- decoderLayer → int
-
Number of BART decoder layers.
final
- decoderNumHeads → int
-
Number of attention heads for the decoder.
final
- encoderEmbedDim → int
-
Embedding dimension for the encoder.
final
-
encoderLayer
→ List<
int> -
Depth of each Swin Transformer stage.
final
-
encoderNumHeads
→ List<
int> -
Number of attention heads per encoder stage.
final
- encoderOutputDim → int
-
Compute the encoder's output dimension.
no setter
- hashCode → int
-
The hash code for this object.
no setterinherited
-
inputSize
→ List<
int> -
Input image size as
height, width.final - maxLength → int
-
Maximum sequence length for generation.
final
- maxPositionEmbeddings → int
-
Maximum position embeddings for decoder.
final
- nameOrPath → String
-
Path or name of pretrained model.
final
- patchSize → int
-
Patch size for the visual encoder.
final
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
- vocabSize → int
-
Vocabulary size.
final
- windowSize → int
-
Window size for Swin Transformer.
final
Methods
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
toJson(
) → Map< String, dynamic> - Convert to JSON map for serialization.
-
toString(
) → String -
A string representation of this object.
override
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited
Constants
-
imagenetMean
→ const List<
double> - ImageNet normalization mean.
-
imagenetStd
→ const List<
double> - ImageNet normalization std.