SwinEncoder class
Complete Swin Transformer encoder as used in Donut.
Converts an input document image into a sequence of embeddings that can be used as input to the BART decoder.
Architecture:
- PatchEmbed: Split image into patches, project to embedDim
- Multiple SwinLayers: Hierarchical feature extraction with window attention and patch merging
Default config (donut-base):
- embedDim: 128
- depths:
2, 2, 14, 2 - numHeads:
4, 8, 16, 32 - windowSize: 10
- patchSize: 4
Constructors
Properties
- alignLongAxis → bool
-
final
- embedDim → int
-
final
-
encoderLayer
→ List<
int> -
final
- hashCode → int
-
The hash code for this object.
no setterinherited
-
inputSize
→ List<
int> -
final
-
layers
↔ List<
SwinLayer> -
getter/setter pair
-
numHeads
→ List<
int> -
final
- outputDim → int
-
Get the output dimension of the encoder.
no setter
- patchEmbed ↔ PatchEmbed
-
getter/setter pair
- patchH ↔ int
-
getter/setter pair
- patchSize → int
-
final
- patchW ↔ int
-
getter/setter pair
- posDropout ↔ Dropout
-
getter/setter pair
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
- windowSize → int
-
final
Methods
-
forward(
Tensor x) → Tensor - Forward pass.
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
toString(
) → String -
A string representation of this object.
inherited
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited