VisionFlow #

A modular, production-ready Flutter plugin for real-time sign language and gesture recognition. VisionFlow connects directly to the device camera stream to extract MediaPipe landmarks, normalizes them, and runs inference on custom PyTorch or TFLite models.

🌟 Key Features #

Multi-Backend Inference: Load both PyTorch (.pt) and TensorFlow Lite (.tflite) models natively without rewriting plugin logic.
Configurable Detection Modules: Individually toggle Hand, Face, and Pose detection to optimize performance based on your model's requirements.
Dynamic Sequence Buffering: Define exactly how many frames (e.g., 30 frames) make up a gesture sequence. VisionFlow automatically buffers and processes these frames.
Intelligent Normalization: Automatically centers coordinates based on the nose landmark and performs dynamic min-max scaling to match standard ML pipelines, drastically improving model accuracy.
Real-Time Streaming API: Directly pass YUV420 buffers from the camera plugin into VisionFlow for highly optimized, zero-copy native processing.

📦 Installation #

Add vision_flow to your pubspec.yaml:

dependencies:
  vision_flow: ^X.X.X # replace X with the latest version

Android Configuration #

Ensure your android/build.gradle has the required repositories, and that compileSdkVersion is set to 34 or higher (36 recommended).

android {
    compileSdk 36
    // ...
}

🚀 Quick Start Guide #

1. Load Your Model #

Before processing any frames, load your pre-trained model. Place your model (.pt or .tflite) inside the assets/ folder of your Flutter project and declare it in your pubspec.yaml.

flutter:
  assets:
    - assets/models/my_sign_language_model.pt

Initialize the model in your Dart code:

import 'package:vision_flow/vision_flow.dart';

// Initialize a PyTorch model
await VisionFlow.loadModel(
  path: "assets/models/my_sign_language_model.pt",
  backend: VisionFlowModelType.pytorch,
);

// OR initialize a TFLite model
await VisionFlow.loadModel(
  path: "assets/models/my_sign_language_model.tflite",
  backend: VisionFlowModelType.tflite,
);

2. Configure the Pipeline #

Define what the pipeline should detect, and the sequence length expected by your model. The pipeline will automatically pad missing features with zeros.

await VisionFlow.configure(
  hands: true,        // Enable MediaPipe Hand Tracking (extracts 2 hands)
  face: true,         // Enable MediaPipe Face Mesh (extracts 68 key points)
  pose: false,        // Enable Body Pose tracking (if required by your model)
  sequenceLength: 30, // The number of frames your model requires per prediction
);

3. Listen for Predictions #

VisionFlow provides a unified stream for predictions. Whenever the frame buffer reaches the configured sequenceLength, it runs inference and pushes a PredictionResult.

VisionFlow.predictions.listen((PredictionResult result) {
  print("Predicted Label: ${result.label}");
  print("Class Index: ${result.index}");
});

4. Process Camera Frames #

Push raw YUV frames from the Flutter camera plugin directly into VisionFlow. This is highly optimized for Android.

import 'package:camera/camera.dart';

CameraController controller = CameraController(cameras[0], ResolutionPreset.medium);
await controller.initialize();

controller.startImageStream((CameraImage image) async {
  // Pass the raw image planes directly to native code
  await VisionFlow.processFrame(
    y: image.planes[0].bytes,
    u: image.planes[1].bytes,
    v: image.planes[2].bytes,
    width: image.width,
    height: image.height,
    yRowStride: image.planes[0].bytesPerRow,
    uvRowStride: image.planes[1].bytesPerRow,
    uvPixelStride: image.planes[1].bytesPerPixel!,
  );
});

5. Cleanup #

Always dispose of the plugin resources when navigating away from the camera view to free up the native ML engines and the camera feed.

await VisionFlow.dispose();

🧠 Advanced: Understanding the Feature Vector #

VisionFlow extracts raw coordinates, normalizes them frame-by-frame, and constructs a 3D tensor expected by modern DNN/GRU/LSTM architectures.

Output Tensor Shape #

The exact sequence shape constructed in the FrameBuffer before being sent to your model is (1, SequenceLength, 330).

Feature Allocation (The "330" Vector) #

Every single frame outputs exactly 330 normalized float coordinates in the following order:

Right Hand (63 features): 21 landmarks × 3 coordinates (X, Y, Z). If the right hand is not detected, it is padded with zeros.
Left Hand (63 features): 21 landmarks × 3 coordinates (X, Y, Z). If the left hand is not detected, it is padded with zeros.
Face (204 features): 68 specific key landmarks extracted from the 468-point MediaPipe face mesh × 3 coordinates (X, Y, Z).

Built-in Normalization #

To ensure the model is robust to different camera distances and angles, VisionFlow natively applies spatial normalization before inference:

Nose Centering: All X, Y, and Z coordinates across hands and face are subtracted by the coordinate of the nose tip (Face Landmark Index 7).
Min-Max Scaling: The bounding box of the entire 30-frame sequence is calculated, and all coordinates are scaled to a [0, 1] range.

This completely mimics standard Python-based training environments (like scikit-learn's MinMaxScaler), ensuring your exported models perform exactly as they did during training!

vision_flow 0.0.2+1
vision_flow: ^0.0.2+1 copied to clipboard

Metadata

VisionFlow #

🌟 Key Features #

📦 Installation #

Android Configuration #

🚀 Quick Start Guide #

1. Load Your Model #

2. Configure the Pipeline #

3. Listen for Predictions #

4. Process Camera Frames #

5. Cleanup #

🧠 Advanced: Understanding the Feature Vector #

Output Tensor Shape #

Feature Allocation (The "330" Vector) #

Built-in Normalization #

← Metadata

Publisher

Weekly Downloads

Metadata

License

Dependencies

More

vision_flow 0.0.2+1 vision_flow: ^0.0.2+1 copied to clipboard

Metadata

VisionFlow #

🌟 Key Features #

📦 Installation #

Android Configuration #

🚀 Quick Start Guide #

1. Load Your Model #

2. Configure the Pipeline #

3. Listen for Predictions #

4. Process Camera Frames #

5. Cleanup #

🧠 Advanced: Understanding the Feature Vector #

Output Tensor Shape #

Feature Allocation (The "330" Vector) #

Built-in Normalization #

← Metadata

Publisher

Weekly Downloads

Metadata

License

Dependencies

More

vision_flow 0.0.2+1
vision_flow: ^0.0.2+1 copied to clipboard