forward method
The forward pass for the ViT Backbone.
Takes a flattened list of image pixel data.
Returns a list of contextualized ValueVectors (CLS token + patch embeddings).
Implementation
List<ValueVector> forward(List<double> imageData) {
// 1. Create patch embeddings
final patchEmbeddings = _createPatchesAndEmbeddings(imageData);
// 2. Prepend the learnable [CLS] token
final currentClsToken = ValueVector(List.generate(
embedSize,
(i) =>
Value(clsToken.values[i].data, {clsToken.values[i]}, 'cls_copy')));
final sequence = [currentClsToken, ...patchEmbeddings];
// 3. Add positional embeddings
final sequenceWithPositionalEmbeddings =
List.generate(sequence.length, (i) {
return sequence[i] + positionEmbeddings[i];
});
// 4. Pass the sequence through the Transformer Encoder
final encodedFeatures =
transformerEncoder.forwardEmbeddings(sequenceWithPositionalEmbeddings);
// Return all encoded features (CLS token + patch embeddings)
return encodedFeatures;
}