vision_flow 0.0.3 copy "vision_flow: ^0.0.3" to clipboard
vision_flow: ^0.0.3 copied to clipboard

A Flutter plugin for real-time vision tasks, Hands, Face, Pose estimation, and Video Classification. Built on top of MediaPipe, powered by PyTorch and TensorFlow Lite.

VisionFlow #

A modular, production-ready Flutter plugin for real-time sign language and gesture recognition. VisionFlow connects directly to the device camera stream to extract MediaPipe landmarks, normalizes them, and runs inference on custom PyTorch or TFLite models.


🌟 Key Features #

  • Multi-Backend Inference: Load both PyTorch (.pt) and TensorFlow Lite (.tflite) models natively without rewriting plugin logic.
  • Configurable Detection Modules: Individually toggle Hand, Face, and Pose detection to optimize performance based on your model's requirements.
  • Dynamic Sequence Buffering: Define exactly how many frames (e.g., 30 frames) make up a gesture sequence. VisionFlow automatically buffers and processes these frames.
  • Intelligent Normalization: Automatically centers coordinates based on the nose landmark and performs dynamic min-max scaling to match standard ML pipelines, drastically improving model accuracy.
  • Real-Time Streaming API: Directly pass YUV420 buffers from the camera plugin into VisionFlow for highly optimized, zero-copy native processing.

📦 Installation #

Add vision_flow to your pubspec.yaml:

dependencies:
  vision_flow: ^X.X.X # replace X with the latest version

Android Configuration #

Ensure your android/build.gradle has the required repositories, and that compileSdkVersion is set to 34 or higher (36 recommended).

android {
    compileSdk 36
    // ...
}

🚀 Quick Start Guide #

1. Load Your Model #

VisionFlow supports two model loading strategies. Choose the one that fits your use case:


Option A — Load from Flutter Assets (bundled with the app)

Pack the model into your app at build time. Best for a fixed, pre-trained model shipped with the release.

  1. Place your model inside your Flutter app's assets/ folder.

  2. Declare it in pubspec.yaml:

    flutter:
    assets:
      - assets/models/my_model.pt
    
  3. Load it at runtime:

import 'package:vision_flow/vision_flow.dart';

await VisionFlow.loadModel(
  path: 'assets/models/my_model.pt',
  backend: VisionFlowModelType.pytorch,
  isAsset: true, // default — can be omitted
);

Option B — Load from Device Storage (file picker)

Allow the user to supply their own model file at runtime. Useful for research tools or apps that let users swap models without a new release.

Add file_picker to your pubspec.yaml:

dependencies:
  file_picker: ^8.0.0

Then use it to pick any .pt or .tflite file:

import 'package:file_picker/file_picker.dart';
import 'package:vision_flow/vision_flow.dart';

final result = await FilePicker.platform.pickFiles(
  type: FileType.custom,
  allowedExtensions: ['pt', 'tflite'],
);

if (result != null) {
  final filePath = result.files.single.path!;
  final backend = filePath.endsWith('.pt')
      ? VisionFlowModelType.pytorch
      : VisionFlowModelType.tflite;

  await VisionFlow.loadModel(
    path: filePath,
    backend: backend,
    isAsset: false, // absolute device path
  );
}

2. Configure the Pipeline #

Define what the pipeline should detect, and the sequence length expected by your model. The pipeline will automatically pad missing features with zeros.

await VisionFlow.configure(
  hands: true,        // Enable MediaPipe Hand Tracking (extracts 2 hands)
  face: true,         // Enable MediaPipe Face Mesh (extracts 68 key points)
  pose: false,        // Enable Body Pose tracking (if required by your model)
  sequenceLength: 30, // The number of frames your model requires per prediction
);

3. Listen for Predictions #

VisionFlow provides a unified stream for predictions. Whenever the frame buffer reaches the configured sequenceLength, it runs inference and pushes a PredictionResult.

VisionFlow.predictions.listen((PredictionResult result) {
  print("Predicted Label: ${result.label}");
  print("Class Index: ${result.index}");
});

4. Process Camera Frames #

Push raw YUV frames from the Flutter camera plugin directly into VisionFlow. This is highly optimized for Android.

import 'package:camera/camera.dart';

CameraController controller = CameraController(cameras[0], ResolutionPreset.medium);
await controller.initialize();

controller.startImageStream((CameraImage image) async {
  // Pass the raw image planes directly to native code
  await VisionFlow.processFrame(
    y: image.planes[0].bytes,
    u: image.planes[1].bytes,
    v: image.planes[2].bytes,
    width: image.width,
    height: image.height,
    yRowStride: image.planes[0].bytesPerRow,
    uvRowStride: image.planes[1].bytesPerRow,
    uvPixelStride: image.planes[1].bytesPerPixel!,
  );
});

5. Cleanup #

Always dispose of the plugin resources when navigating away from the camera view to free up the native ML engines and the camera feed.

await VisionFlow.dispose();

🧠 Advanced: Understanding the Feature Vector #

VisionFlow extracts raw coordinates, normalizes them frame-by-frame, and constructs a 3D tensor expected by modern DNN/GRU/LSTM architectures.

Output Tensor Shape #

The exact sequence shape constructed in the FrameBuffer before being sent to your model is (1, SequenceLength, 330).

Feature Allocation (The "330" Vector) #

Every single frame outputs exactly 330 normalized float coordinates in the following order:

  1. Right Hand (63 features): 21 landmarks × 3 coordinates (X, Y, Z). If the right hand is not detected, it is padded with zeros.
  2. Left Hand (63 features): 21 landmarks × 3 coordinates (X, Y, Z). If the left hand is not detected, it is padded with zeros.
  3. Face (204 features): 68 specific key landmarks extracted from the 468-point MediaPipe face mesh × 3 coordinates (X, Y, Z).

Built-in Normalization #

To ensure the model is robust to different camera distances and angles, VisionFlow natively applies spatial normalization before inference:

  1. Nose Centering: All X, Y, and Z coordinates across hands and face are subtracted by the coordinate of the nose tip (Face Landmark Index 7).

  2. Min-Max Scaling: The bounding box of the entire 30-frame sequence is calculated, and all coordinates are scaled to a [0, 1] range.

This completely mimics standard Python-based training environments (like scikit-learn's MinMaxScaler), ensuring your exported models perform exactly as they did during training!

0
likes
0
points
100
downloads

Publisher

unverified uploader

Weekly Downloads

A Flutter plugin for real-time vision tasks, Hands, Face, Pose estimation, and Video Classification. Built on top of MediaPipe, powered by PyTorch and TensorFlow Lite.

License

unknown (license)

Dependencies

flutter, plugin_platform_interface

More

Packages that depend on vision_flow

Packages that implement vision_flow