forward method
Forward pass for the object detector.
Takes a flattened list of image pixel data. Returns a Map containing lists of object predictions.
The encodedFeatures from the backbone will contain:
CLS_token_embedding, patch_embedding_1, patch_embedding_2, ...
For this simple head, we'll use the CLS token's output.
Implementation
Map<String, List<ValueVector>> forward(List<double> imageData) {
// Get contextualized features from the ViT backbone
final List<ValueVector> encodedFeatures = backbone.forward(imageData);
// For this simple detection head, we'll use the CLS token's output
// as the global image representation for prediction.
final ValueVector clsFeature = encodedFeatures[0];
// Pass the CLS feature to the detection head, which will produce multiple predictions
final Map<String, List<ValueVector>> predictions =
detectionHead.forward(clsFeature);
return predictions;
}