ViTBackbone class - vit_backbone library

ViTBackbone class

A Vision Transformer (ViT) backbone for extracting image features.

This model processes an image by dividing it into patches, linearly embedding them, adding positional information, and feeding them through a Transformer Encoder. It outputs the contextualized embeddings of all patches (and optionally a CLS token) for downstream tasks like object detection.

Inheritance

Object
Module
ViTBackbone

Constructors

ViTBackbone({required int imageSize, required int patchSize, int numChannels = 3, required int embedSize, int numLayers = 2, int numHeads = 4})

Properties

clsToken → ValueVector: final
embedSize → int: final
hashCode → int: The hash code for this object.
no setterinherited
imageSize → int: final
numChannels → int: final
numHeads → int: final
numLayers → int: final
patchProjection → Layer: final
patchSize → int: final
positionEmbeddings → List<ValueVector>: final
runtimeType → Type: A representation of the runtime type of the object.
no setterinherited
transformerEncoder → TransformerEncoder: final

Methods

forward(List<double> imageData) → List<ValueVector>: The forward pass for the ViT Backbone.
noSuchMethod(Invocation invocation) → dynamic: Invoked when a nonexistent method or property is accessed.
inherited
parameters() → List<Value>: override
toString() → String: A string representation of this object.
inherited
zeroGrad() → void: inherited

Operators

operator ==(Object other) → bool: The equality operator.
inherited