Google Vision CLI

A command-line interface for the Google Vision API. Wraps the google_vision Dart package to provide image detection and annotation from the terminal.

Quick Start
Installation
- Homebrew
- Dart pub
Prerequisites
Authentication
Usage
Global Options
Commands
Feature Types
License

Quick Start

# Install via Homebrew
brew tap cdavis-code/google-vision
brew install vision

# Run landmark detection
vision detect --image-file photo.jpg --features LANDMARK_DETECTION

Installation

Homebrew

brew tap cdavis-code/google-vision
brew install vision

# Run landmark detection
vision detect --image-file photo.jpg --features LANDMARK_DETECTION

Dart pub

# Activate globally
dart pub global activate google_vision_cli

# Or run directly from the workspace
vision <command> [arguments]

Prerequisites

A Google Cloud service account with the Vision API enabled
A JSON credentials file for the service account

Authentication

The CLI authenticates via JWT only using a service account key file (API keys are not supported).

Getting a credentials file

Go to the Google Cloud Console, create or select a project
Enable the Vision API
Go to IAM & Admin → Service Accounts, create a new service account
Select the service account, go to Keys → Add Key → Create New Key, choose JSON
Download the key file and place it at the default path:

mkdir -p ~/.vision
mv ~/Downloads/project-key.json ~/.vision/credentials.json

# Place credentials at the default path
~/.vision/credentials.json

# Or specify a custom path
vision --credential-file /path/to/credentials.json <command>

Usage

vision [global-options] <command> [command-options]

Global Options

Option	Default	Description
`--credential-file`	`~/.vision/credentials.json`	Path to JWT credentials JSON file
`--log-level`	`off`	Log level: `all`, `debug`, `info`, `warning`, `error`, `off`

Commands

`version`

Display the package name and version.

vision version

`detect`

Run image detection and annotation for a batch of images or files.

Option	Required	Description
`--image-file`	yes	Path to the image file to process
`--features`	yes	Comma-separated list of detection types
`--pages`	no	Comma-separated list of page numbers for PDF/TIFF/GIF (max 5)
`--max-results`	no	Maximum results per feature (default: 10, max: 50)

# Label detection on an image
vision detect --image-file photo.jpg --features LABEL_DETECTION

# Multiple features on a single image
vision detect --image-file photo.jpg \
  --features LABEL_DETECTION,FACE_DETECTION,TEXT_DETECTION

# Process specific pages of a PDF
vision detect --image-file document.pdf \
  --features DOCUMENT_TEXT_DETECTION --pages 1,2,3

`crop_hints`

Get crop suggestion vertices for an image.

Option	Required	Description
`--image-file`	yes	Path to the image file to process
`--aspect-ratios`	no	Comma-separated list of aspect ratios as floats (e.g. `1.33333` for 4:3)
`--pages`	no	Comma-separated list of page numbers for PDF/TIFF/GIF (max 5)

# Get default crop hints
vision crop_hints --image-file photo.jpg

# Specify desired aspect ratios
vision crop_hints --image-file photo.jpg \
  --aspect-ratios 1.33333,1.77778

`safe_search`

Detect explicit content (adult, violence, medical, racy) in an image.

Option	Required	Description
`--image-file`	yes	Path to the image file to process

vision safe_search --image-file photo.jpg

`highlight`

Draw bounding boxes around detected objects and save the result.

Option	Required	Description
`--image-file`	yes	Path to the image file to process
`--output-file`	yes	Path to save the annotated image
`--features`	no	Comma-separated list of detection types
`--line-color`	no	Box color: `red`, `green`, `blue`, `white`, `black` (default: `red`)
`--max-results`	no	Maximum results per feature (default: 10, max: 50)

# Highlight detected faces
vision highlight --image-file photo.jpg \
  --output-file annotated.jpg --features FACE_DETECTION

# Highlight objects and landmarks in blue
vision highlight --image-file photo.jpg \
  --output-file annotated.jpg \
  --features OBJECT_LOCALIZATION,LANDMARK_DETECTION \
  --line-color blue

`markdown`

Convert the result of DOCUMENT_TEXT_DETECTION to a well-formatted markdown document. Walks the Page → Block → Paragraph → Word → Symbol hierarchy and emits headers, paragraphs, lists, tables, checkboxes, and image placeholders.

Option	Required	Description
`--image-file`	yes	Path to the image or PDF file to process
`--output-file`, `-o`	no	Path to write the markdown output (defaults to stdout)
`--pages`, `-p`	no	Comma-separated list of page numbers for PDF/TIFF/GIF (max 5)
`--[no-]page-headers`	no	Emit `# Page N` at the top of each page (default: on)
`--[no-]image-placeholders`	no	Emit placeholder links for `BlockType.PICTURE` blocks (default: on)
`--[no-]detect-headers`	no	Detect headers via relative symbol height (default: on)
`--[no-]detect-lists`	no	Detect bullet and ordered lists (default: on)
`--[no-]detect-checkboxes`	no	Detect checkbox glyphs and `[ ]` / `[x]` patterns (default: on)

# Convert an image to markdown on stdout
vision markdown --image-file form.jpg

# Convert pages 1 and 2 of a PDF to a file
vision markdown --image-file scan.pdf --pages 1,2 -o out.md

# Disable header detection and image placeholders
vision markdown --image-file scan.pdf --pages 1 \
  --no-detect-headers --no-image-placeholders -o out.md

`score`

Get confidence scores for detected objects as a JSON array.

Option	Required	Description
`--image-file`	yes	Path to the image file to process
`--features`	no	Comma-separated list of detection types
`--max-results`	no	Maximum results per feature (default: 10, max: 50)

# Get face detection confidence scores
vision score --image-file photo.jpg \
  --features FACE_DETECTION

# Get label detection scores
vision score --image-file photo.jpg \
  --features LABEL_DETECTION

# Output: [0.98, 0.95, 0.87, ...]

Feature Types

Feature	Description
`FACE_DETECTION`	Detect faces with facial attributes
`LANDMARK_DETECTION`	Detect popular landmarks
`LOGO_DETECTION`	Detect product logos
`LABEL_DETECTION`	Label image content (objects, activities, etc.)
`TEXT_DETECTION`	OCR — detect and extract text
`DOCUMENT_TEXT_DETECTION`	Dense text/document OCR with page/block/paragraph structure
`SAFE_SEARCH_DETECTION`	Adult/violent/explicit content likelihood
`IMAGE_PROPERTIES`	Dominant colors and image attributes
`CROP_HINTS`	Suggested crop region vertices
`WEB_DETECTION`	Web references and similar images
`PRODUCT_SEARCH`	Product search against a product set
`OBJECT_LOCALIZATION`	Detect and locate multiple objects

See the Google Vision API documentation for details.

License

This project is licensed under the terms included in the repository.