forward method - ViTBackbone class - vit_backbone library

forward method

List<double> imageData

)

The forward pass for the ViT Backbone.

Takes a flattened list of image pixel data. Returns a list of contextualized ValueVectors (CLS token + patch embeddings).

Implementation

List<ValueVector> forward(List<double> imageData) {
  // 1. Create patch embeddings
  final patchEmbeddings = _createPatchesAndEmbeddings(imageData);

  // 2. Prepend the learnable [CLS] token
  final currentClsToken = ValueVector(List.generate(
      embedSize,
      (i) =>
          Value(clsToken.values[i].data, {clsToken.values[i]}, 'cls_copy')));
  final sequence = [currentClsToken, ...patchEmbeddings];

  // 3. Add positional embeddings
  final sequenceWithPositionalEmbeddings =
      List.generate(sequence.length, (i) {
    return sequence[i] + positionEmbeddings[i];
  });

  // 4. Pass the sequence through the Transformer Encoder
  final encodedFeatures =
      transformerEncoder.forwardEmbeddings(sequenceWithPositionalEmbeddings);

  // Return all encoded features (CLS token + patch embeddings)
  return encodedFeatures;
}