VideoTransformer class - video_transformer library

VideoTransformer class

A Transformer model adapted for video classification (e.g., action recognition).

This model expects pre-extracted video frame or clip embeddings as a sequence of ValueVectors. These embeddings could come from a pre-trained CNN (like ResNet) or a Vision Transformer applied per frame.

Inheritance

Object
Module
VideoTransformer

Constructors

VideoTransformer({required int frameEmbedDim, required int embedSize, required int maxVideoSequenceLength, required int numClasses, int numLayers = 2, int numHeads = 4})

Properties

embedSize → int: final
frameEmbedDim → int: final
frameProjection → Layer?: final
hashCode → int: The hash code for this object.
no setterinherited
maxVideoSequenceLength → int: final
mlpHead → Layer: final
numClasses → int: final
numHeads → int: final
numLayers → int: final
positionEmbeddings → List<ValueVector>: final
runtimeType → Type: A representation of the runtime type of the object.
no setterinherited
transformerEncoder → TransformerEncoder: final

Methods

forward(List<ValueVector> videoEmbeddings) → List<Value>: The forward pass for the Video Transformer.
noSuchMethod(Invocation invocation) → dynamic: Invoked when a nonexistent method or property is accessed.
inherited
parameters() → List<Value>: override
toString() → String: A string representation of this object.
inherited
zeroGrad() → void: inherited

Operators

operator ==(Object other) → bool: The equality operator.
inherited