vision_ai #

On-device hand gesture recognition and facial emotion detection for Flutter. Runs at 25-30 FPS with zero cloud dependencies.

vision_ai banner

What You Can Build #

Sign language interpreter — map 13+ gestures to words with custom finger patterns
Driver drowsiness alert — blink detection + attention scoring triggers warnings
Touchless kiosk — hand motion direction controls UI without touching screen
Online exam proctoring — attention score + face tracking + head nod/shake
Fitness rep counter / form checker — 33-point body pose with joint-angle, rep, and posture analytics (via vision_ai_pose)
Interactive children's game — emotion-driven characters + clap/pinch detection
Accessibility controller — custom gestures → app actions, blink-to-click
Live stream reactions — real-time emotion overlay on broadcaster's face
AR filter trigger — face contours + landmarks drive filter positioning
Social distance monitor — face distance estimation in cm

Platform Support #

Platform	Status	Min Version	Notes
Android	Stable	API 24 (Android 7.0)	Tested on Samsung Galaxy A15 and other devices
iOS	Stable	iOS 12.0	Verified on a physical device (iPhone 16 Pro Max); broader device coverage welcome (report issues)

Installation #

dependencies:
  vision_ai: ^0.4.0
  vision_ai_models: ^0.1.0   # bundled hand + face models (core ships none — see below)
  vision_ai_flutter: ^0.4.0  # optional: pre-built camera overlay widgets
  # vision_ai_animals: ^0.1.0  # add for animal detection + breed classification
  # vision_ai_pose: ^0.1.0     # add for body pose detection + fitness/posture analytics

Models (0.3.0+): the core plugin no longer bundles any ML models — it loads each from a file path you supply via ModelPaths. Add vision_ai_models and call VisionAiModels.ensureLoaded() to get the hand/face paths, or build a ModelPaths yourself. This keeps an app's footprint to only the models it actually uses. For on-device animal detection see the vision_ai_animals package, and for body pose see vision_ai_pose.

Android #

Add camera permission to android/app/src/main/AndroidManifest.xml:

<uses-permission android:name="android.permission.CAMERA" />

Release builds (important)

MediaPipe uses stack-walking internally to load its native libraries. R8 code shrinking obfuscates the caller class names, which crashes the app at runtime with no caller found on the stack. To fix this, disable minification in your app's android/app/build.gradle.kts:

android {
    buildTypes {
        release {
            isMinifyEnabled = false
            isShrinkResources = false
        }
    }
}

Without this, the app works in debug mode but crashes in release mode when initializing the hand gesture recognizer.

iOS #

Add camera usage description to ios/Runner/Info.plist:

<key>NSCameraUsageDescription</key>
<string>Camera access is needed for hand gesture and face detection.</string>

Core API #

VisionAi #

The main controller. Create it, start the camera, listen to results, dispose when done.

import 'package:vision_ai_models/vision_ai_models.dart';

// Core bundles no models — load the bundled hand/face paths once.
final models = await VisionAiModels.ensureLoaded();

// Hand + face combined
final vision = VisionAi(
  hand: HandConfig(maxHands: 2),
  face: FaceConfig(detectEmotion: true),
  models: models, // required
  camera: CameraConfig(facing: CameraFacing.front),
);

// Or use factory constructors for single-mode:
final handOnly = VisionAi.hand(models: models);
final faceOnly = VisionAi.face(models: models);

Method	Returns	Description
`start()`	`Future<int>`	Starts camera + ML processing. Returns texture ID for Flutter's `Texture` widget.
`stop()`	`Future<void>`	Stops processing, releases camera. Can `start()` again after.
`dispose()`	`Future<void>`	Releases everything. Instance is unusable after this.
`results`	`Stream<VisionResult>`	Per-frame detection results. Active between `start()` and `stop()`.
`updateHandConfig(config)`	`Future<void>`	Hot-swap hand settings while running. Requires restart for some changes.
`updateFaceConfig(config)`	`Future<void>`	Hot-swap face settings while running.
`updateAnimalConfig(config)`	`Future<void>`	Hot-swap animal detection settings (threshold, categories, breed toggle) while running.
`updatePoseConfig(config)`	`Future<void>`	Hot-swap pose detection settings (numPoses, confidence thresholds) while running.
`switchCamera(facing)`	`Future<void>`	Switch front/back live while running (in-place; the preview keeps rendering).
`isRunning`	`bool`	Whether the camera is actively processing frames.

Animal detection is also driven by this controller (VisionAi(animal: AnimalConfig(...))), with AnimalConfig/AnimalResult defined here in core. For the ergonomic wrapper, bundled animal models, analytics, and overlays, use the vision_ai_animals package — see its README.

Body pose likewise (VisionAi(pose: PoseConfig(...))), with PoseConfig/PoseResult defined here in core (a 33-point skeleton: normalized landmarks + worldLandmarks). For the ergonomic wrapper, bundled pose model, fitness/posture analytics, and skeleton overlay, use the vision_ai_pose package — see its README.

VisionResult #

Every frame produces one of these. Contains all detected hands and faces for that frame.

vision.results.listen((result) {
  print('Hands: ${result.hands.length}, Faces: ${result.faces.length}');
  print('Frame size: ${result.imageSize}');
  print('ML took: ${result.inferenceTimeMs}ms');
});

Property	Type	Description
`hands`	`List<HandResult>`	All detected hands (0, 1, or 2 depending on `maxHands`)
`faces`	`List<FaceResult>`	All detected faces
`animals`	`List<AnimalResult>`	All detected animals (empty unless an `AnimalConfig` is active)
`poses`	`List<PoseResult>`	All detected body poses (empty unless a `PoseConfig` is active)
`timestampMs`	`int`	Milliseconds since device boot
`imageSize`	`Size`	Camera frame dimensions (for scaling overlays)
`inferenceTimeMs`	`int`	Combined ML processing time
`primaryHand`	`HandResult?`	Hand with highest gesture confidence, or null
`primaryFace`	`FaceResult?`	Face with highest emotion confidence, or null
`primaryAnimal`	`AnimalResult?`	Animal with highest detection confidence, or null
`primaryPose`	`PoseResult?`	First detected pose, or null
`hasHands`	`bool`	`hands.isNotEmpty`
`hasFaces`	`bool`	`faces.isNotEmpty`
`hasAnimals`	`bool`	`animals.isNotEmpty`
`hasPoses`	`bool`	`poses.isNotEmpty`

Hand Detection #

HandConfig #

HandConfig(
  maxHands: 2,                    // 1 or 2 hands to detect
  minDetectionConfidence: 0.5,    // [0.0, 1.0] — lower = more detections, more false positives
  minPresenceConfidence: 0.5,     // [0.0, 1.0] — confidence hand is still present between frames
  minTrackingConfidence: 0.5,     // [0.0, 1.0] — landmark tracking quality threshold
  customGestures: [...],          // your own finger patterns (see below)
  allowedGestures: {Gesture.peace, Gesture.thumbsUp},  // only report these (null = all)
  deniedGestures: {Gesture.fist},                       // block these (null = none)
  gestureThresholds: {Gesture.thumbsUp: 0.8},           // per-gesture min confidence
)

HandResult #

Each detected hand has landmarks, gesture, finger states, and a bounding box.

final hand = result.primaryHand;
if (hand != null) {
  print(hand.gesture);            // Gesture.peace
  print(hand.gestureConfidence);  // 0.95
  print(hand.isLeftHand);         // true/false (from camera's perspective)
  print(hand.customGestureName);  // "rock" (only for user-defined gestures)
  print(hand.boundingBox);        // Rect in normalized [0,1] coords
}

Property	Type	Description
`gesture`	`Gesture`	Detected gesture enum (fist, peace, thumbsUp, etc.)
`gestureConfidence`	`double`	[0.0, 1.0] confidence for the gesture
`customGestureName`	`String?`	Non-null only for user-defined custom gestures
`landmarks`	`List<NormalizedLandmark>`	21 points in [0.0, 1.0] image coordinates
`worldLandmarks`	`List<WorldLandmark>`	21 points in meters (real-world scale)
`isLeftHand`	`bool`	Handedness from camera's perspective
`handednessConfidence`	`double`	How confident the L/R classification is
`fingerStates`	`Map<Finger, FingerState>`	Extended/closed for each finger
`boundingBox`	`Rect?`	Normalized bounding box from landmark min/max. Null if no landmarks.

Finger States #

Check which fingers are extended:

final fingers = hand.fingerStates;
if (fingers[Finger.indexFinger] == FingerState.extended &&
    fingers[Finger.middle] == FingerState.extended) {
  print('Peace sign!');
}

// Count extended fingers
final count = fingers.values.where((s) => s == FingerState.extended).length;
print('$count fingers up');

21 Hand Landmarks #

Each hand has 21 3D landmarks. Use HandLandmarkIndex constants to access specific joints:

final wrist = hand.landmarks[HandLandmarkIndex.wrist];         // index 0
final thumbTip = hand.landmarks[HandLandmarkIndex.thumbTip];   // index 4
final indexTip = hand.landmarks[HandLandmarkIndex.indexTip];   // index 8
final middleTip = hand.landmarks[HandLandmarkIndex.middleTip]; // index 12
final pinkyTip = hand.landmarks[HandLandmarkIndex.pinkyTip];   // index 20

// Convert to pixel coordinates for drawing
final pixelPos = wrist.toOffset(screenWidth, screenHeight);

// All 23 bone connections for skeleton rendering:
for (final bone in HandLandmarkIndex.connections) {
  final from = hand.landmarks[bone[0]];
  final to = hand.landmarks[bone[1]];
  // draw line from → to
}

Landmark indices: 0=wrist, 1-4=thumb (CMC→tip), 5-8=index (MCP→tip), 9-12=middle, 13-16=ring, 17-20=pinky.

World Coordinates (Meters) #

worldLandmarks give real-world 3D positions relative to the hand's center. Use them to measure actual distances:

// Pinch distance in centimeters
final thumbTip = hand.worldLandmarks[HandLandmarkIndex.thumbTip];
final indexTip = hand.worldLandmarks[HandLandmarkIndex.indexTip];
final pinchCm = thumbTip.distanceTo(indexTip) * 100;
print('Pinch gap: ${pinchCm.toStringAsFixed(1)}cm');

// Hand span (thumb to pinky)
final pinkyTip = hand.worldLandmarks[HandLandmarkIndex.pinkyTip];
final spanCm = thumbTip.distanceTo(pinkyTip) * 100;
print('Hand span: ${spanCm.toStringAsFixed(1)}cm');

Custom Gestures #

Define finger patterns. Fingers not in the map act as wildcards (any state matches):

HandConfig(
  customGestures: [
    // Rock sign: index + pinky up, others down
    CustomGesture(
      name: 'rock',
      fingerStates: {
        Finger.thumb: FingerState.closed,
        Finger.indexFinger: FingerState.extended,
        Finger.middle: FingerState.closed,
        Finger.ring: FingerState.closed,
        Finger.pinky: FingerState.extended,
      },
    ),
    // Gun: thumb + index up (other fingers are wildcards)
    CustomGesture(
      name: 'gun',
      fingerStates: {
        Finger.thumb: FingerState.extended,
        Finger.indexFinger: FingerState.extended,
      },
    ),
  ],
)

Custom gestures are checked after built-in MediaPipe gestures fail. Priority: OK → counting 1-5 → your patterns (first match wins).

When a custom gesture matches, hand.gesture == Gesture.custom and hand.customGestureName == "rock".

Gesture Filtering #

Control which gestures are reported:

HandConfig(
  // Only report these (everything else becomes Gesture.none)
  allowedGestures: {Gesture.thumbsUp, Gesture.peace, Gesture.fist},
  
  // OR block specific ones (everything else passes through)
  deniedGestures: {Gesture.fist, Gesture.openHand},
  
  // Raise the bar for specific gestures
  gestureThresholds: {
    Gesture.thumbsUp: 0.8,  // must be 80%+ confident
    Gesture.peace: 0.7,
  },
)

Filtering happens after MediaPipe classification but before custom gesture fallback. So if fist is denied and the user makes a fist, the custom gesture classifier still gets a chance.

Supported Gestures #

Gesture	Enum	Source	When detected
Fist	`Gesture.fist`	MediaPipe	All fingers closed
Open Hand	`Gesture.openHand`	MediaPipe	All fingers spread
Peace	`Gesture.peace`	MediaPipe	Index + middle up
Thumbs Up	`Gesture.thumbsUp`	MediaPipe	Thumb up, others closed
Thumbs Down	`Gesture.thumbsDown`	MediaPipe	Thumb down, others closed
Pointing Up	`Gesture.pointingUp`	MediaPipe	Index up, others closed
I Love You	`Gesture.iLoveYou`	MediaPipe	Thumb + index + pinky
OK	`Gesture.ok`	Custom	Thumb-index pinch, others extended
One–Five	`Gesture.one`–`Gesture.five`	Custom	Counting patterns
User-defined	`Gesture.custom`	Your config	Check `customGestureName`

Face Detection #

FaceConfig #

FaceConfig(
  detectEmotion: true,       // run TFLite emotion classifier (~5-15ms extra)
  detectLandmarks: false,    // 10 face landmark points (eyes, nose, mouth, ears, cheeks)
  detectContours: false,     // 15 face contour types (detailed mesh)
  minFaceSize: 0.1,          // [0.0, 1.0] — fraction of image width; smaller = slower
  enableTracking: true,      // stable face IDs across frames (can't use with contours)
  minEmotionConfidence: 0.4, // stored for future filtering
  accurateMode: false,       // ML Kit ACCURATE mode — better for distant faces, ~2-3x slower
)

Note: Contour mode and face tracking are mutually exclusive (ML Kit limitation on both platforms). Enabling contours automatically disables tracking.

FaceResult #

final face = result.primaryFace;
if (face != null) {
  print(face.emotion);              // Emotion.happy
  print(face.emotionConfidence);    // 0.98
  print(face.smilingProbability);   // 0.95 (null if not available)
  print(face.leftEyeOpenProbability);  // 0.92
  print(face.rightEyeOpenProbability); // 0.88
  print(face.trackingId);           // 42 (-1 when tracking disabled)
  print(face.boundingBox);          // Rect in pixel coordinates
  
  // Euler angles (degrees)
  print(face.headEulerAngleX);  // pitch: positive = looking up
  print(face.headEulerAngleY);  // yaw: positive = turned right  
  print(face.headEulerAngleZ);  // roll: positive = head tilted right
  
  // Emotion scores for all 7 classes
  face.emotionScores.forEach((emotion, score) {
    print('$emotion: ${(score * 100).toStringAsFixed(0)}%');
  });
}

Property	Type	Description
`emotion`	`Emotion`	Highest-scoring emotion
`emotionConfidence`	`double`	[0.0, 1.0] score for the top emotion
`emotionScores`	`Map<Emotion, double>`	All 7 class probabilities
`boundingBox`	`Rect`	Face position in pixel coordinates
`headEulerAngleX`	`double`	Pitch in degrees (+ = looking up)
`headEulerAngleY`	`double`	Yaw in degrees (+ = turned right)
`headEulerAngleZ`	`double`	Roll in degrees (+ = tilted right)
`smilingProbability`	`double?`	[0.0, 1.0] or null
`leftEyeOpenProbability`	`double?`	[0.0, 1.0] or null
`rightEyeOpenProbability`	`double?`	[0.0, 1.0] or null
`trackingId`	`int`	Stable ID across frames (-1 when tracking off)
`landmarks`	`List<Offset>?`	10 points in pixel coords (null when `detectLandmarks: false`)
`contours`	`List<List<Offset>>?`	15 contour polylines (null when `detectContours: false`)

Supported Emotions #

Emotion	Enum	Reliability	Notes
Happy	`Emotion.happy`	High	Smiles detected very reliably
Neutral	`Emotion.neutral`	High	Default resting face
Surprised	`Emotion.surprised`	High	Wide eyes + open mouth
Sad	`Emotion.sad`	Medium	Works with exaggerated expressions
Angry	`Emotion.angry`	Medium	Furrowed brows help
Disgusted	`Emotion.disgusted`	Low	Often confused with angry
Fearful	`Emotion.fearful`	Low	Often confused with surprised

Face Landmarks (10 points) #

When detectLandmarks: true, pixel-coordinate positions for:

Index	Point	Use case
0	Left eye center	Gaze direction, blink
1	Right eye center	Gaze direction, blink
2	Nose base	Face center reference
3	Mouth left corner	Smile detection
4	Mouth right corner	Smile width
5	Mouth bottom	Mouth open detection
6	Left ear	Face width
7	Right ear	Face width
8	Left cheek	Face shape
9	Right cheek	Face shape

Missing points (face turned away) return Offset(-1, -1).

Face Contours (15 types) #

When detectContours: true, detailed polylines for face mesh rendering:

Face outline, left/right eyebrow (top + bottom), left/right eye, upper/lower lip (top + bottom), nose bridge, nose bottom, left/right cheek center.

Each contour is a List<Offset> of connected points in pixel coordinates.

Dart-Only Detectors #

These run entirely in Dart — no native code, no extra ML models. They consume FaceResult or HandResult from the stream and compute higher-level events. All are stateful: create once, feed every frame, call reset() when switching subjects.

BlinkDetector #

Detects eye blinks from open/close probability transitions.

final blinkDetector = BlinkDetector(
  openThreshold: 0.7,       // above this = "eyes open"
  closedThreshold: 0.3,     // below this = "eyes closed"
  maxBlinkDurationMs: 500,  // longer closures are ignored (not a blink)
);

vision.results.listen((result) {
  final face = result.primaryFace;
  if (face != null) {
    final blink = blinkDetector.update(face, result.timestampMs);
    if (blink != null) {
      print('${blink.eye} blink, ${blink.durationMs}ms'); // BlinkEye.left, .right, or .both
    }
  }
});

Use cases: Blink-to-click for accessibility, drowsiness detection (slow/frequent blinks), liveness check for authentication.

HeadGestureDetector #

Detects head nod (yes) and shake (no) from Euler angle oscillations.

final headDetector = HeadGestureDetector(
  nodAngleThreshold: 8.0,      // degrees of pitch change to count as a nod movement
  shakeAngleThreshold: 10.0,   // degrees of yaw change to count as a shake movement
  minOscillations: 3,          // direction changes needed (3 = 1.5 back-and-forth cycles)
  windowMs: 1000,              // oscillations must happen within this time window
  cooldownMs: 1500,            // wait after detection before allowing another
);

vision.results.listen((result) {
  final face = result.primaryFace;
  if (face != null) {
    final gesture = headDetector.update(face, result.timestampMs);
    if (gesture != null) {
      print(gesture.gesture == HeadGesture.nod ? 'YES' : 'NO');
    }
  }
});

Use cases: Hands-free yes/no input, survey responses, accessibility confirmation.

FaceDistanceEstimator #

Estimates camera-to-face distance using the pinhole camera model.

final distanceEstimator = FaceDistanceEstimator(
  assumedFaceWidthCm: 15.0,  // average adult face ~14-16cm
  cameraFovDegrees: 75.0,    // most phone front cameras are 70-80 degrees
);

vision.results.listen((result) {
  final face = result.primaryFace;
  if (face != null) {
    final estimate = distanceEstimator.estimate(face, result.imageSize);
    if (estimate != null) {
      print('${estimate.distanceCm.toStringAsFixed(0)}cm — ${estimate.zone.name}');
      // Zones: veryClose (<30cm), close (30-60cm), medium (60-120cm), far (>120cm)
    }
  }
});

Use cases: Screen distance warnings, social distancing, zoom-based UI scaling. Accuracy is ~20-30%, good for zone detection, not precise measurement.

AttentionScorer #

Combines three signals into a single 0-100% attention/engagement score:

Eye openness (40% weight) — average of both eyes
Face orientation (40% weight) — pitch + yaw distance from center
Head stability (20% weight) — inverse of angular velocity over 500ms

final scorer = AttentionScorer(
  eyeWeight: 0.4,
  orientationWeight: 0.4,
  stabilityWeight: 0.2,
  maxPitchDegrees: 45.0,       // beyond this angle, orientation score = 0
  maxYawDegrees: 45.0,
  stabilityWindowMs: 500,
  maxAngularVelocity: 60.0,    // degrees/sec above which stability = 0
);

vision.results.listen((result) {
  final face = result.primaryFace;
  if (face != null) {
    final attention = scorer.update(face, result.timestampMs);
    if (attention != null) {
      print('Attention: ${(attention.score * 100).toStringAsFixed(0)}% (${attention.level.name})');
      print('  Eye: ${(attention.eyeScore * 100).toStringAsFixed(0)}%');
      print('  Orientation: ${(attention.orientationScore * 100).toStringAsFixed(0)}%');
      print('  Stability: ${(attention.stabilityScore * 100).toStringAsFixed(0)}%');
      // AttentionLevel: high (>=75%), medium (45-75%), low (15-45%), none (<15%)
    }
  }
});

Use cases: E-learning engagement tracking, proctoring, driver monitoring, meeting participation.

HandMotionTracker #

Tracks hand velocity and movement direction across frames.

final tracker = HandMotionTracker(
  windowMs: 200,                // velocity averaged over this window
  stillThreshold: 0.02,         // below this speed = still
  trackingLandmarkIndex: 0,     // 0 = wrist (default), or any landmark index
);

vision.results.listen((result) {
  final hand = result.primaryHand;
  if (hand != null) {
    final motion = tracker.update(hand, result.timestampMs);
    if (motion != null) {
      print('Speed: ${motion.speed.toStringAsFixed(2)}/s');  // normalized units/sec
      print('Direction: ${motion.direction.name}');          // up, upRight, right, etc.
      print('State: ${motion.state.name}');                  // still, slow, moderate, fast
      print('Velocity: (${motion.velocityX}, ${motion.velocityY})');
    }
  }
});

Directions: up, upRight, right, downRight, down, downLeft, left, upLeft (8 compass points).

States: still (<0.02), slow (0.02-0.15), moderate (0.15-0.5), fast (>0.5 normalized units/sec).

Use cases: Swipe gesture recognition, wave detection, touchless scrolling direction.

TwoHandInteractionDetector #

Detects interactions between two hands.

final twoHand = TwoHandInteractionDetector(
  pinchThreshold: 0.06,          // index tips within 6% of image width
  touchThreshold: 0.08,          // any fingertips within 8%
  clapVelocityThreshold: 0.3,    // wrist approach speed for clap
  cooldownMs: 500,               // ms between detections
);

vision.results.listen((result) {
  final event = twoHand.update(result);  // takes full VisionResult, not single hand
  if (event != null) {
    print('${event.gesture.name} at distance ${event.distance.toStringAsFixed(3)}');
    // TwoHandGesture: pinch, clap, touching
  }
});

Requires HandConfig(maxHands: 2). Detection priority: pinch (most specific) → clap (velocity-based) → touching (fallback).

Use cases: Zoom gestures, clap-to-action, collaborative interactions.

Camera Configuration #

CameraConfig(
  facing: CameraFacing.front,           // .front or .back
  resolution: AnalysisResolution.medium, // .low (320x240), .medium (640x480), .high (1280x720)
  maxResultsPerSecond: 0,               // 0 = no throttle (every frame)
)

Emission Throttling #

Control how many results per second reach Dart. The ML pipeline still runs at full speed — throttling only skips the emission so the next result is always fresh.

Value	Effect	Best for
`0`	Every frame (~20-30 FPS)	Smooth hand skeleton drawing
`10-15`	Balanced	Gesture/emotion labels with acceptable landmark lag
`5`	Labels only	Minimal CPU, choppy skeletons

CameraConfig(maxResultsPerSecond: 10)

Camera Preview #

VisionAi.start() returns a texture ID. Render with Flutter's Texture widget:

final textureId = await vision.start();
// In your build:
Texture(textureId: textureId)

Or use VisionAiCameraView from vision_ai_flutter for a complete solution with overlays.

Architecture #

All ML inference runs on-device:

Hand gestures: MediaPipe Gesture Recognizer (~8MB model, GPU delegate with CPU fallback)
Face detection: Google ML Kit Face Detection (bundled per-platform)
Emotion: TFLite CNN trained on FER2013 (~2MB model, 2 inference threads)

Camera frames are processed natively (CameraX on Android, AVFoundation on iOS). Only lightweight results cross the platform channel — raw frame data never leaves the native side.

Example App #

The package ships with a full-featured demo app that lets you test every feature before writing any code. It includes a settings panel with per-feature toggles organized into cards, so you can enable/disable individual capabilities and see the results in real-time.

cd example
flutter run

What you can test:

Toggle hand detection, face detection, or both simultaneously
Switch between front/back camera and low/medium/high resolution
Enable hand motion tracking, two-hand interaction, gesture filtering
Enable blink detection, head nod/shake, face distance, attention scoring
Toggle individual overlays: hand skeleton, hand bounding box, face box, face contours, gesture label, emotion label, world coordinates
Adjust detection confidence, min face size, max results/sec with sliders
Try accurate mode for face detection
Define a custom "rock" gesture out of the box

All toggles apply instantly for overlay settings. Switching the front/back camera applies live while running (no restart). Other detection and camera changes require a restart (tap Stop then Start). When you disable hand or face detection, all related sub-settings and overlay options disappear automatically and reset to defaults.

The example also serves as a reference implementation showing how to use ValueNotifier + ValueListenableBuilder instead of setState for reactive state management with this package.

iOS #

The iOS implementation mirrors the Android architecture (AVFoundation + MediaPipe + ML Kit + TFLite) and is verified working on a physical device (iPhone 16 Pro Max). Coverage across more iOS models is still welcome:

Run the example app: cd example && flutter run
Exercise hand gestures, face/emotion detection, and the Animals screen
Report anything off at GitHub Issues with the ios label and your device model

License #

Apache 2.0 — see LICENSE and NOTICE. Forks must retain attribution and state changes.

vision_ai 0.4.0 vision_ai: ^0.4.0 copied to clipboard

Metadata

vision_ai #

What You Can Build #

Platform Support #

Installation #

Android #

Release builds (important)

iOS #

Core API #

VisionAi #

VisionResult #

Hand Detection #

HandConfig #

HandResult #

Finger States #

21 Hand Landmarks #

World Coordinates (Meters) #

Custom Gestures #

Gesture Filtering #

Supported Gestures #

Face Detection #

FaceConfig #

FaceResult #

Supported Emotions #

Face Landmarks (10 points) #

Face Contours (15 types) #

Dart-Only Detectors #

BlinkDetector #

HeadGestureDetector #

FaceDistanceEstimator #

AttentionScorer #

HandMotionTracker #

TwoHandInteractionDetector #

Camera Configuration #

Emission Throttling #

Camera Preview #

Architecture #

Example App #

iOS #

License #

← Metadata

Documentation

Publisher

Weekly Downloads

Metadata

Topics

License

Dependencies

More

vision_ai 0.4.0
vision_ai: ^0.4.0 copied to clipboard