Google Vision Images REST API Client
Native Dart package that integrates Google Vision features, including image labeling, face, logo, and landmark detection, optical character recognition (OCR), and detection of explicit content, into your applications.
If you are looking at integrating the Google Vision API into your Flutter SDK application then you might want to take a look at my related package google_vision_flutter, which provides a widget that wraps the functionality provided by this Dart SDK focussed package.
- Google Vision Images REST API Client
Project Status
Please feel free to submit PRs for any additional helper methods, or report an issue for a missing helper method and I'll add it if I have time available.
Recent Changes
New for v2.2.0
- Security enhancements: Credential leakage prevention with secure logging interceptor that automatically redacts sensitive headers
- Input validation: All API methods now validate parameters (maxResults bounds, required fields)
- Credential management: New
clearCredentials()method for secure logout and credential cleanup - Retry support:
RetryUtilityclass with exponential backoff for resilient API calls - Configurable OAuth: JWT generator now supports custom OAuth endpoints for Google Cloud Enterprise
- Bug fix: Token expiry calculation corrected - tokens now refresh properly
- Documentation: Added Security & Best Practices section with code examples
New for v2.0.0
- Even though this package worked when used with the
webplatform the pub.dev analyzer would not show it aswebplatform compatible due to the use of theuniversal_iopackage which has a dependency ondart:io. This version has removed theuniversal_iodependency from the core package, so some related method signatures have been removed. - The deprecated methods from in v1.3.x have been removed in this version.
- Logging functionality has been added to the package
final googleVision = await GoogleVision(LogLevel.all).withJwt(
File('service_credentials.json').readAsStringSync(),
);
New for v1.4.0
- A breaking change from the previous version is that the
GoogleVisionclass now follows the Singleton design pattern. Now the object is instantiated as follows:
// Old method from v1.3.x and earlier
// final googleVision = await GoogleVision.withJwtFile('service_credentials.json');
// New
final googleVision = await GoogleVision().withJwtFile('service_credentials.json');
New for v1.3.0
- This version of the package supports both the
imageandfileannotation APIs for Google Vision. The previous versions of the package supported only theimageAPI. - A number of methods and classes have been Deprecated in this version. All the provided examples still work without any changes, so the changes in this package should not cause any issue to existing code.
- The
filefunctionality added to this release allows for the annotation of file formats that have pages or frames, specificallypdf,tiffandgif. Google Vision allows annotation of up to 5 pages/frames.
Getting Started
pubspec.yaml
To use this package, add the dependency to your pubspec.yaml file:
dependencies:
...
google_vision: ^2.2.0
Obtaining Authentication/Authorization Credentials
Authenticating to the Cloud Vision API can be done with one of two methods:
- The first method requires a JSON file with the JWT token information, which you can obtain by creating a service account in the API console.
- The second method requires an API key to be created.
Both of the authorization/authentication methods listed above assume that you already have a Google account, you have created a Google Cloud project and you have enabled the Cloud Vision API in the Google API library.
Usage of the Cloud Vision API
final googleVision = await GoogleVision().withApiKey(
Platform.environment['GOOGLE_VISION_API_KEY'] ?? '[YOUR API KEY]',
// additionalHeaders: {'com.xxx.xxx': 'X-Ios-Bundle-Identifier'},
);
print('checking...');
final faceAnnotationResponses = await googleVision.image.faceDetection(
JsonImage.fromGsUri(
'gs://gvision-demo/young-man-smiling-and-thumbs-up.jpg'));
for (var faceAnnotation in faceAnnotationResponses) {
print('Face - ${faceAnnotation.detectionConfidence}');
print('Joy - ${faceAnnotation.enumJoyLikelihood}');
}
// Output:
// Face - 0.9609375
// Joy - Likelihood.UNLIKELY
print('done.');
New Helper Methods
Method Signature |
Description |
|---|---|
| Future<AnnotateImageResponse> detection( JsonImage jsonImage, AnnotationType annotationType, {ImageContext? imageContext, int maxResults = 10,} ) |
Lower level method for a single detection type as specified by annotationType |
| Future<CropHintsAnnotation?> cropHints( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} ) |
Crop Hints suggests vertices for a crop region on an image. |
| Future<FullTextAnnotation?> documentTextDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} ) |
Extracts text from an image (or file); the response is optimized for dense text and documents. The break information. A specific use of documentTextDetection is to detect handwriting in an image. |
| Future<List<FaceAnnotation>> faceDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} ) |
Face Detection detects multiple faces within an image along with the associated key facial attributes such as emotional state or wearing headwear. |
| Future<ImagePropertiesAnnotation?> imageProperties( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} ) |
The Image Properties feature detects general attributes of the image, such as dominant color. |
| Future<List<EntityAnnotation>> labelDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} ) |
Labels can identify general objects, locations, activities, animal species, products, and more. Labels are returned in English only. |
| Future<List<EntityAnnotation>> landmarkDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} ) |
Landmark Detection detects popular natural and human-made structures within an image. |
| Future<List<EntityAnnotation>> logoDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} ) |
Logo Detection detects popular product logos within an image. |
| Future<List<LocalizedObjectAnnotation>> objectLocalization( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} ) |
The Vision API can detect and extract multiple objects in an image with Object Localization. Object localization identifies multiple objects in an image and provides a LocalizedObjectAnnotation for each object in the image. Each LocalizedObjectAnnotation identifies information about the object, the position of the object, and rectangular bounds for the region of the image that contains the object. Object localization identifies both significant and less-prominent objects in an image. |
| Future<SafeSearchAnnotation?> safeSearchDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} ) |
SafeSearch Detection detects explicit content such as adult content or violent content within an image. This feature uses five categories (adult, spoof, medical, violence, and racy) and returns the likelihood that each is present in a given image. See the SafeSearchAnnotation page for details on these fields. |
| Future<List<EntityAnnotation>> textDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} ) |
Detects and extracts text from any image. For example, a photograph might contain a street sign or traffic sign. The JSON includes the entire extracted string, as well as individual words, and their bounding boxes. |
| Future<WebDetection?> webDetection( JsonImage jsonImage, {ImageContext? imageContext, int maxResults = 10,} ) |
Web Detection detects Web references to an image. |
Security & Best Practices
Credential Management
Always clear credentials when they're no longer needed:
// When logging out or shutting down
googleVision.clearCredentials();
Retry with Exponential Backoff
For production apps, wrap API calls with retry logic:
import 'package:google_vision/google_vision.dart';
final result = await RetryUtility.withRetry(
() => googleVision.image.faceDetection(jsonImage),
maxRetries: 3,
baseDelay: Duration(seconds: 1),
);
Input Validation
All detection methods automatically validate:
maxResultsmust be between 1 and 100JsonImagemust have content, source, or GCS URIInputConfigmust have gcsSource or content
Invalid parameters throw ArgumentError immediately, preventing unnecessary API calls.
Usage with Flutter
For a quick intro into the use of Google Vision in a Flutter, take a look at the google_vision_flutter package and the example folder of the project's GitHub repository.
If Flutter specific Google Vision Widget doesn't meet your requirements, then to work with Flutter it's usually necessary to convert an object that is presented as an Asset or a Stream into a File for use by this google_vision package. This StackOverflow post gives an idea on how this can be accomplished. A similar process can be used for any Stream of data that represents an image supported by google_vision. Essentially, the Google Vision REST API needs to be able to convert the image data into its Base64 representation before submitting it to the Google server and having the bytedata available in the code makes this easier.
Vision cli (google_vision at the command prompt)
This package includes a companion CLI package google_vision_cli that can be used to return data for any API call currently supported by the package. If you want to get started quickly with the cli utility:
Install using dart pub:
dart pub global activate google_vision_cli
Or install via Homebrew:
brew tap cdavis-code/google-vision
brew install vision
Run the following command to see help:
vision --help
Please see the google_vision_cli documentation for more detailed usage information.
Document → Markdown
The result of DOCUMENT_TEXT_DETECTION (a FullTextAnnotation) can be converted directly to a well-formatted markdown document. The converter walks the Page → Block → Paragraph → Word → Symbol hierarchy and uses Vision's native BlockType, the per-word DetectedBreak, and a relative symbol-height heuristic to emit headers, paragraphs, lists, tables, checkboxes, and image placeholders.
final fullTextAnnotation = await googleVision.image.documentTextDetection(
JsonImage.fromBuffer(inputBytes.buffer),
);
final markdown = fullTextAnnotation!.toMarkdown();
print(markdown);
Pass a MarkdownOptions instance to disable individual detectors or tune the header height ratios:
final markdown = fullTextAnnotation.toMarkdown(
options: const MarkdownOptions(
detectCheckboxes: false,
headerH1Ratio: 1.5,
),
);
For multi-page PDFs use MarkdownConverter.convertPages() after flattening the file responses:
final fileResponses = await googleVision.file.documentTextDetection(
InputConfig.fromBuffer(pdfBytes.buffer),
pages: [1, 2, 3],
);
final pages = [
for (final fr in fileResponses)
for (final ir in fr.responses ?? const [])
...?ir.fullTextAnnotation?.pages,
];
final markdown = const MarkdownConverter().convertPages(pages);
Reference
Cloud Vision API - Documentation Reference
Contributors
Contributing
Any help from the open-source community is always welcome and needed:
- Found an issue?
- Please fill a bug report with details.
- Need a feature?
- Open a feature request with use cases.
- Are you using and liking the project?
- Promote the project: create an article or post about it
- Make a donation
- Do you have a project that uses this package
- let's cross promote, let me know and I'll add a link to your project
- Are you a developer?
- Fix a bug and send a pull request.
- Implement a new feature.
- Improve the Unit Tests.
- Have you already helped in any way?
- Many thanks from me, the contributors and everybody that uses this project!
If you donate 1 hour of your time, you can contribute a lot, because others will do the same, just be part and start with your 1 hour.
Libraries
- google_vision
- Integrates Google Vision features, including image labeling, face, logo, and landmark detection, optical character recognition (OCR), and detection of explicit content, into applications.
- meta
- app metadata