vision_flow 0.0.1
vision_flow: ^0.0.1 copied to clipboard
A Flutter plugin for real-time vision tasks
VisionFlow #
A modular and production-ready Flutter plugin for real-time sign language and gesture recognition using MediaPipe and custom ML models (PyTorch and TFLite).
Features #
- Dynamic Detection: Configure exactly what to detect (Hands, Face, Pose).
- Dual Backends: Seamlessly switch between PyTorch and TFLite models for inference.
- Optimized Feature Extraction: Automatically processes and normalizes 330 landmarks (2x 21 hand points, 68 face points) across a dynamic sequence length to match your exact ML inputs.
- Real-Time Streaming: Push live camera frames into the plugin and listen for prediction events.
Installation #
Add vision_flow to your pubspec.yaml:
dependencies:
vision_flow: ^0.0.1
Android Configuration #
Ensure your android/build.gradle has the required repositories, and compileSdkVersion is at least 34 (36 recommended).
Usage #
1. Load the Model #
Before starting, load your pre-trained model. Place your model (.pt or .tflite) inside the assets/ folder and define it in your pubspec.yaml.
import 'package:vision_flow/vision_flow.dart';
// Load PyTorch Model
await VisionFlow.loadModel(
path: "model.pt",
backend: VisionFlowModelType.pytorch,
);
// OR Load TFLite Model
await VisionFlow.loadModel(
path: "model.tflite",
backend: VisionFlowModelType.tflite,
);
2. Configure the Pipeline #
Define what the pipeline should detect and the sequence length expected by your model:
await VisionFlow.configure(
hands: true,
face: true,
pose: false,
sequenceLength: 30, // 30 frames per prediction sequence
);
3. Listen to Predictions #
Subscribe to the prediction stream:
VisionFlow.predictions.listen((result) {
print("Predicted Label: ${result.label} (Index: ${result.index})");
});
4. Process Camera Frames #
Push raw YUV frames from the camera plugin directly into VisionFlow:
await VisionFlow.processFrame(
y: image.planes[0].bytes,
u: image.planes[1].bytes,
v: image.planes[2].bytes,
width: image.width,
height: image.height,
yRowStride: image.planes[0].bytesPerRow,
uvRowStride: image.planes[1].bytesPerRow,
uvPixelStride: image.planes[1].bytesPerPixel!,
);
5. Cleanup #
Always dispose of the plugin resources when done:
await VisionFlow.dispose();
Advanced: Custom Models #
VisionFlow extracts raw coordinates and normalizes them frame-by-frame, centering them relative to the face nose tip point. The exact sequence shape constructed in the FrameBuffer before being sent to your model is (1, SequenceLength, 330).
- Right Hand: 63 features (21 points * 3 coords)
- Left Hand: 63 features (21 points * 3 coords)
- Face: 204 features (68 points * 3 coords)