VideoTransformer class

A Transformer model adapted for video classification (e.g., action recognition).

This model expects pre-extracted video frame or clip embeddings as a sequence of ValueVectors. These embeddings could come from a pre-trained CNN (like ResNet) or a Vision Transformer applied per frame.

Inheritance

Constructors

VideoTransformer({required int frameEmbedDim, required int embedSize, required int maxVideoSequenceLength, required int numClasses, int numLayers = 2, int numHeads = 4})

Properties

embedSize int
final
frameEmbedDim int
final
frameProjection Layer?
final
hashCode int
The hash code for this object.
no setterinherited
maxVideoSequenceLength int
final
mlpHead Layer
final
numClasses int
final
numHeads int
final
numLayers int
final
positionEmbeddings List<ValueVector>
final
runtimeType Type
A representation of the runtime type of the object.
no setterinherited
transformerEncoder TransformerEncoder
final

Methods

forward(List<ValueVector> videoEmbeddings) List<Value>
The forward pass for the Video Transformer.
noSuchMethod(Invocation invocation) → dynamic
Invoked when a nonexistent method or property is accessed.
inherited
parameters() List<Value>
override
toString() String
A string representation of this object.
inherited
zeroGrad() → void
inherited

Operators

operator ==(Object other) bool
The equality operator.
inherited