ViTBackbone class
A Vision Transformer (ViT) backbone for extracting image features.
This model processes an image by dividing it into patches, linearly embedding them, adding positional information, and feeding them through a Transformer Encoder. It outputs the contextualized embeddings of all patches (and optionally a CLS token) for downstream tasks like object detection.
Constructors
Properties
- clsToken → ValueVector
-
final
- embedSize → int
-
final
- hashCode → int
-
The hash code for this object.
no setterinherited
- imageSize → int
-
final
- numChannels → int
-
final
- numHeads → int
-
final
- numLayers → int
-
final
- patchProjection → Layer
-
final
- patchSize → int
-
final
-
positionEmbeddings
→ List<
ValueVector> -
final
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
- transformerEncoder → TransformerEncoder
-
final
Methods
-
forward(
List< double> imageData) → List<ValueVector> - The forward pass for the ViT Backbone.
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
parameters(
) → List< Value> -
override
-
toString(
) → String -
A string representation of this object.
inherited
-
zeroGrad(
) → void -
inherited
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited