SwinEncoder class

Complete Swin Transformer encoder as used in Donut.

Converts an input document image into a sequence of embeddings that can be used as input to the BART decoder.

Architecture:

  1. PatchEmbed: Split image into patches, project to embedDim
  2. Multiple SwinLayers: Hierarchical feature extraction with window attention and patch merging

Default config (donut-base):

  • embedDim: 128
  • depths: 2, 2, 14, 2
  • numHeads: 4, 8, 16, 32
  • windowSize: 10
  • patchSize: 4

Constructors

SwinEncoder({required List<int> inputSize, bool alignLongAxis = false, int windowSize = 10, List<int> encoderLayer = const [2, 2, 14, 2], int embedDim = 128, List<int> numHeads = const [4, 8, 16, 32], int patchSize = 4})

Properties

alignLongAxis bool
final
embedDim int
final
encoderLayer List<int>
final
hashCode int
The hash code for this object.
no setterinherited
inputSize List<int>
final
layers List<SwinLayer>
getter/setter pair
numHeads List<int>
final
outputDim int
Get the output dimension of the encoder.
no setter
patchEmbed PatchEmbed
getter/setter pair
patchH int
getter/setter pair
patchSize int
final
patchW int
getter/setter pair
posDropout Dropout
getter/setter pair
runtimeType Type
A representation of the runtime type of the object.
no setterinherited
windowSize int
final

Methods

forward(Tensor x) Tensor
Forward pass.
noSuchMethod(Invocation invocation) → dynamic
Invoked when a nonexistent method or property is accessed.
inherited
toString() String
A string representation of this object.
inherited

Operators

operator ==(Object other) bool
The equality operator.
inherited