SwinEncoder class - donut library

SwinEncoder class

Complete Swin Transformer encoder as used in Donut.

Converts an input document image into a sequence of embeddings that can be used as input to the BART decoder.

Architecture:

PatchEmbed: Split image into patches, project to embedDim
Multiple SwinLayers: Hierarchical feature extraction with window attention and patch merging

Default config (donut-base):

Constructors

SwinEncoder({required List<int> inputSize, bool alignLongAxis = false, int windowSize = 10, List<int> encoderLayer = const [2, 2, 14, 2], int embedDim = 128, List<int> numHeads = const [4, 8, 16, 32], int patchSize = 4})

alignLongAxis → bool: final
embedDim → int: final
encoderLayer → List<int>: final
hashCode → int: The hash code for this object.
no setterinherited
inputSize → List<int>: final
layers ↔ List<SwinLayer>: getter/setter pair
numHeads → List<int>: final
outputDim → int: Get the output dimension of the encoder.
no setter
patchEmbed ↔ PatchEmbed: getter/setter pair
patchH ↔ int: getter/setter pair
patchSize → int: final
patchW ↔ int: getter/setter pair
posDropout ↔ Dropout: getter/setter pair
runtimeType → Type: A representation of the runtime type of the object.
no setterinherited
windowSize → int: final

forward(Tensor x) → Tensor: Forward pass.
noSuchMethod(Invocation invocation) → dynamic: Invoked when a nonexistent method or property is accessed.
inherited
toString() → String: A string representation of this object.
inherited