Google Vision CLI
A command-line interface for the Google Vision API. Wraps the google_vision Dart package to provide image detection and annotation from the terminal.
Table of Contents
- Quick Start
- Installation
- Prerequisites
- Authentication
- Usage
- Global Options
- Commands
- Feature Types
- License
Quick Start
# Install via Homebrew
brew tap cdavis-code/google-vision
brew install vision
# Run landmark detection
vision detect --image-file photo.jpg --features LANDMARK_DETECTION
Installation
Homebrew
brew tap cdavis-code/google-vision
brew install vision
# Run landmark detection
vision detect --image-file photo.jpg --features LANDMARK_DETECTION
Dart pub
# Activate globally
dart pub global activate google_vision_cli
# Or run directly from the workspace
vision <command> [arguments]
Prerequisites
- A Google Cloud service account with the Vision API enabled
- A JSON credentials file for the service account
Authentication
The CLI authenticates via JWT only using a service account key file (API keys are not supported).
Getting a credentials file
- Go to the Google Cloud Console, create or select a project
- Enable the Vision API
- Go to IAM & Admin → Service Accounts, create a new service account
- Select the service account, go to Keys → Add Key → Create New Key, choose JSON
- Download the key file and place it at the default path:
mkdir -p ~/.vision
mv ~/Downloads/project-key.json ~/.vision/credentials.json
# Place credentials at the default path
~/.vision/credentials.json
# Or specify a custom path
vision --credential-file /path/to/credentials.json <command>
Usage
vision [global-options] <command> [command-options]
Global Options
| Option | Default | Description |
|---|---|---|
--credential-file |
~/.vision/credentials.json |
Path to JWT credentials JSON file |
--log-level |
off |
Log level: all, debug, info, warning, error, off |
Commands
version
Display the package name and version.
vision version
detect
Run image detection and annotation for a batch of images or files.
| Option | Required | Description |
|---|---|---|
--image-file |
yes | Path to the image file to process |
--features |
yes | Comma-separated list of detection types |
--pages |
no | Comma-separated list of page numbers for PDF/TIFF/GIF (max 5) |
--max-results |
no | Maximum results per feature (default: 10, max: 50) |
# Label detection on an image
vision detect --image-file photo.jpg --features LABEL_DETECTION
# Multiple features on a single image
vision detect --image-file photo.jpg \
--features LABEL_DETECTION,FACE_DETECTION,TEXT_DETECTION
# Process specific pages of a PDF
vision detect --image-file document.pdf \
--features DOCUMENT_TEXT_DETECTION --pages 1,2,3
crop_hints
Get crop suggestion vertices for an image.
| Option | Required | Description |
|---|---|---|
--image-file |
yes | Path to the image file to process |
--aspect-ratios |
no | Comma-separated list of aspect ratios as floats (e.g. 1.33333 for 4:3) |
--pages |
no | Comma-separated list of page numbers for PDF/TIFF/GIF (max 5) |
# Get default crop hints
vision crop_hints --image-file photo.jpg
# Specify desired aspect ratios
vision crop_hints --image-file photo.jpg \
--aspect-ratios 1.33333,1.77778
safe_search
Detect explicit content (adult, violence, medical, racy) in an image.
| Option | Required | Description |
|---|---|---|
--image-file |
yes | Path to the image file to process |
vision safe_search --image-file photo.jpg
highlight
Draw bounding boxes around detected objects and save the result.
| Option | Required | Description |
|---|---|---|
--image-file |
yes | Path to the image file to process |
--output-file |
yes | Path to save the annotated image |
--features |
no | Comma-separated list of detection types |
--line-color |
no | Box color: red, green, blue, white, black (default: red) |
--max-results |
no | Maximum results per feature (default: 10, max: 50) |
# Highlight detected faces
vision highlight --image-file photo.jpg \
--output-file annotated.jpg --features FACE_DETECTION
# Highlight objects and landmarks in blue
vision highlight --image-file photo.jpg \
--output-file annotated.jpg \
--features OBJECT_LOCALIZATION,LANDMARK_DETECTION \
--line-color blue
markdown
Convert the result of DOCUMENT_TEXT_DETECTION to a well-formatted markdown document. Walks the Page → Block → Paragraph → Word → Symbol hierarchy and emits headers, paragraphs, lists, tables, checkboxes, and image placeholders.
| Option | Required | Description |
|---|---|---|
--image-file |
yes | Path to the image or PDF file to process |
--output-file, -o |
no | Path to write the markdown output (defaults to stdout) |
--pages, -p |
no | Comma-separated list of page numbers for PDF/TIFF/GIF (max 5) |
--[no-]page-headers |
no | Emit # Page N at the top of each page (default: on) |
--[no-]image-placeholders |
no | Emit placeholder links for BlockType.PICTURE blocks (default: on) |
--[no-]detect-headers |
no | Detect headers via relative symbol height (default: on) |
--[no-]detect-lists |
no | Detect bullet and ordered lists (default: on) |
--[no-]detect-checkboxes |
no | Detect checkbox glyphs and [ ] / [x] patterns (default: on) |
# Convert an image to markdown on stdout
vision markdown --image-file form.jpg
# Convert pages 1 and 2 of a PDF to a file
vision markdown --image-file scan.pdf --pages 1,2 -o out.md
# Disable header detection and image placeholders
vision markdown --image-file scan.pdf --pages 1 \
--no-detect-headers --no-image-placeholders -o out.md
score
Get confidence scores for detected objects as a JSON array.
| Option | Required | Description |
|---|---|---|
--image-file |
yes | Path to the image file to process |
--features |
no | Comma-separated list of detection types |
--max-results |
no | Maximum results per feature (default: 10, max: 50) |
# Get face detection confidence scores
vision score --image-file photo.jpg \
--features FACE_DETECTION
# Get label detection scores
vision score --image-file photo.jpg \
--features LABEL_DETECTION
# Output: [0.98, 0.95, 0.87, ...]
Feature Types
| Feature | Description |
|---|---|
FACE_DETECTION |
Detect faces with facial attributes |
LANDMARK_DETECTION |
Detect popular landmarks |
LOGO_DETECTION |
Detect product logos |
LABEL_DETECTION |
Label image content (objects, activities, etc.) |
TEXT_DETECTION |
OCR — detect and extract text |
DOCUMENT_TEXT_DETECTION |
Dense text/document OCR with page/block/paragraph structure |
SAFE_SEARCH_DETECTION |
Adult/violent/explicit content likelihood |
IMAGE_PROPERTIES |
Dominant colors and image attributes |
CROP_HINTS |
Suggested crop region vertices |
WEB_DETECTION |
Web references and similar images |
PRODUCT_SEARCH |
Product search against a product set |
OBJECT_LOCALIZATION |
Detect and locate multiple objects |
See the Google Vision API documentation for details.
License
This project is licensed under the terms included in the repository.
Libraries
- google_vision_cli
- meta
- app metadata