Google Vision Images REST API Client

banner

Native Dart package that integrates Google Vision features, including image labeling, face, logo, and landmark detection, optical character recognition (OCR), and detection of explicit content, into your applications.

If you are looking at integrating the Google Vision API into your Flutter SDK application then you might want to take a look at my related package google_vision_flutter, which provides a widget that wraps the functionality provided by this Dart SDK focussed package.

Google Vision Images REST API Client

Project Status

Please feel free to submit PRs for any additional helper methods, or report an issue for a missing helper method and I'll add it if I have time available.

Recent Changes

New for v2.2.0

Security enhancements: Credential leakage prevention with secure logging interceptor that automatically redacts sensitive headers
Input validation: All API methods now validate parameters (maxResults bounds, required fields)
Credential management: New clearCredentials() method for secure logout and credential cleanup
Retry support: RetryUtility class with exponential backoff for resilient API calls
Configurable OAuth: JWT generator now supports custom OAuth endpoints for Google Cloud Enterprise
Bug fix: Token expiry calculation corrected - tokens now refresh properly
Documentation: Added Security & Best Practices section with code examples

New for v2.0.0

Even though this package worked when used with the web platform the pub.dev analyzer would not show it as web platform compatible due to the use of the universal_io package which has a dependency on dart:io. This version has removed the universal_io dependency from the core package, so some related method signatures have been removed.
The deprecated methods from in v1.3.x have been removed in this version.
Logging functionality has been added to the package

final googleVision = await GoogleVision(LogLevel.all).withJwt(
  File('service_credentials.json').readAsStringSync(),
);

New for v1.4.0

A breaking change from the previous version is that the GoogleVision class now follows the Singleton design pattern. Now the object is instantiated as follows:


// Old method from v1.3.x and earlier
// final googleVision = await GoogleVision.withJwtFile('service_credentials.json');

// New
final googleVision = await GoogleVision().withJwtFile('service_credentials.json');

New for v1.3.0

This version of the package supports both the image and file annotation APIs for Google Vision. The previous versions of the package supported only the image API.
A number of methods and classes have been Deprecated in this version. All the provided examples still work without any changes, so the changes in this package should not cause any issue to existing code.
The file functionality added to this release allows for the annotation of file formats that have pages or frames, specifically pdf, tiff and gif. Google Vision allows annotation of up to 5 pages/frames.

Getting Started

pubspec.yaml

To use this package, add the dependency to your pubspec.yaml file:

dependencies:
  ...
  google_vision: ^2.2.0

Obtaining Authentication/Authorization Credentials

Authenticating to the Cloud Vision API can be done with one of two methods:

The first method requires a JSON file with the JWT token information, which you can obtain by creating a service account in the API console.
The second method requires an API key to be created.

Both of the authorization/authentication methods listed above assume that you already have a Google account, you have created a Google Cloud project and you have enabled the Cloud Vision API in the Google API library.

Usage of the Cloud Vision API

final googleVision = await GoogleVision().withApiKey(
  Platform.environment['GOOGLE_VISION_API_KEY'] ?? '[YOUR API KEY]',
  // additionalHeaders: {'com.xxx.xxx': 'X-Ios-Bundle-Identifier'},
);

print('checking...');

final faceAnnotationResponses = await googleVision.image.faceDetection(
    JsonImage.fromGsUri(
        'gs://gvision-demo/young-man-smiling-and-thumbs-up.jpg'));

for (var faceAnnotation in faceAnnotationResponses) {
  print('Face - ${faceAnnotation.detectionConfidence}');

  print('Joy - ${faceAnnotation.enumJoyLikelihood}');
}

// Output:
// Face - 0.9609375
// Joy - Likelihood.UNLIKELY

print('done.');

New Helper Methods

Method Signature	Description
Future<AnnotateImageResponse> detection( JsonImage jsonImage, AnnotationType annotationType, {ImageContext? imageContext, int maxResults = 10,} )	Lower level method for a single detection type as specified by annotationType
Future<CropHintsAnnotation?> cropHints( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} )	Crop Hints suggests vertices for a crop region on an image.
Future<FullTextAnnotation?> documentTextDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} )	Extracts text from an image (or file); the response is optimized for dense text and documents. The break information. A specific use of documentTextDetection is to detect handwriting in an image.
Future<List<FaceAnnotation>> faceDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} )	Face Detection detects multiple faces within an image along with the associated key facial attributes such as emotional state or wearing headwear.
Future<ImagePropertiesAnnotation?> imageProperties( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} )	The Image Properties feature detects general attributes of the image, such as dominant color.
Future<List<EntityAnnotation>> labelDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} )	Labels can identify general objects, locations, activities, animal species, products, and more. Labels are returned in English only.
Future<List<EntityAnnotation>> landmarkDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} )	Landmark Detection detects popular natural and human-made structures within an image.
Future<List<EntityAnnotation>> logoDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} )	Logo Detection detects popular product logos within an image.
Future<List<LocalizedObjectAnnotation>> objectLocalization( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} )	The Vision API can detect and extract multiple objects in an image with Object Localization. Object localization identifies multiple objects in an image and provides a LocalizedObjectAnnotation for each object in the image. Each LocalizedObjectAnnotation identifies information about the object, the position of the object, and rectangular bounds for the region of the image that contains the object. Object localization identifies both significant and less-prominent objects in an image.
Future<SafeSearchAnnotation?> safeSearchDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} )	SafeSearch Detection detects explicit content such as adult content or violent content within an image. This feature uses five categories (adult, spoof, medical, violence, and racy) and returns the likelihood that each is present in a given image. See the SafeSearchAnnotation page for details on these fields.
Future<List<EntityAnnotation>> textDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} )	Detects and extracts text from any image. For example, a photograph might contain a street sign or traffic sign. The JSON includes the entire extracted string, as well as individual words, and their bounding boxes.
Future<WebDetection?> webDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} )	Web Detection detects Web references to an image.

Security & Best Practices

Credential Management

Always clear credentials when they're no longer needed:

// When logging out or shutting down
googleVision.clearCredentials();

Retry with Exponential Backoff

For production apps, wrap API calls with retry logic:

import 'package:google_vision/google_vision.dart';

final result = await RetryUtility.withRetry(
  () => googleVision.image.faceDetection(jsonImage),
  maxRetries: 3,
  baseDelay: Duration(seconds: 1),
);

Input Validation

All detection methods automatically validate:

maxResults must be between 1 and 100
JsonImage must have content, source, or GCS URI
InputConfig must have gcsSource or content

Invalid parameters throw ArgumentError immediately, preventing unnecessary API calls.

Usage with Flutter

For a quick intro into the use of Google Vision in a Flutter, take a look at the google_vision_flutter package and the example folder of the project's GitHub repository.

If Flutter specific Google Vision Widget doesn't meet your requirements, then to work with Flutter it's usually necessary to convert an object that is presented as an Asset or a Stream into a File for use by this google_vision package. This StackOverflow post gives an idea on how this can be accomplished. A similar process can be used for any Stream of data that represents an image supported by google_vision. Essentially, the Google Vision REST API needs to be able to convert the image data into its Base64 representation before submitting it to the Google server and having the bytedata available in the code makes this easier.

Vision cli (google_vision at the command prompt)

This package includes a companion CLI package google_vision_cli that can be used to return data for any API call currently supported by the package. If you want to get started quickly with the cli utility:

Install using dart pub:

dart pub global activate google_vision_cli

Or install via Homebrew:

brew tap cdavis-code/google-vision
brew install vision

Run the following command to see help:

vision --help

Please see the google_vision_cli documentation for more detailed usage information.

Document → Markdown

The result of DOCUMENT_TEXT_DETECTION (a FullTextAnnotation) can be converted directly to a well-formatted markdown document. The converter walks the Page → Block → Paragraph → Word → Symbol hierarchy and uses Vision's native BlockType, the per-word DetectedBreak, and a relative symbol-height heuristic to emit headers, paragraphs, lists, tables, checkboxes, and image placeholders.

final fullTextAnnotation = await googleVision.image.documentTextDetection(
  JsonImage.fromBuffer(inputBytes.buffer),
);

final markdown = fullTextAnnotation!.toMarkdown();
print(markdown);

Pass a MarkdownOptions instance to disable individual detectors or tune the header height ratios:

final markdown = fullTextAnnotation.toMarkdown(
  options: const MarkdownOptions(
    detectCheckboxes: false,
    headerH1Ratio: 1.5,
  ),
);

For multi-page PDFs use MarkdownConverter.convertPages() after flattening the file responses:

final fileResponses = await googleVision.file.documentTextDetection(
  InputConfig.fromBuffer(pdfBytes.buffer),
  pages: [1, 2, 3],
);

final pages = [
  for (final fr in fileResponses)
    for (final ir in fr.responses ?? const [])
      ...?ir.fullTextAnnotation?.pages,
];

final markdown = const MarkdownConverter().convertPages(pages);

Reference

Cloud Vision API - Documentation Reference

Contributors

Contributing

Any help from the open-source community is always welcome and needed:

Found an issue?
- Please fill a bug report with details.
Need a feature?
- Open a feature request with use cases.
Are you using and liking the project?
- Promote the project: create an article or post about it
- Make a donation
Do you have a project that uses this package
- let's cross promote, let me know and I'll add a link to your project
Are you a developer?
- Fix a bug and send a pull request.
- Implement a new feature.
- Improve the Unit Tests.
Have you already helped in any way?
- Many thanks from me, the contributors and everybody that uses this project!

If you donate 1 hour of your time, you can contribute a lot, because others will do the same, just be part and start with your 1 hour.