agent_device

Agent-driven CLI and Dart library for mobile UI automation, accessibility snapshots, network/log/perf observability, video recording, and .ad replay scripts on iOS and Android.

Dart port of agent-device CLI.

Ships as both:

a CLI (agent-device / ad) for day-to-day shell use, and
a Dart library (package:agent_device) you can import into any Dart / Flutter project to drive devices programmatically via AgentDevice.open(...).

Install

# CLI (global activation)
dart pub global activate agent_device

# Library (add to pubspec.yaml)
dart pub add agent_device

The CLI installs two executables: agent-device and ad (short alias).

How it works

agent_device talks to real iOS simulators/devices and Android emulators/devices through their native toolchains:

iOS: an XCUITest runner (Swift) launched via xcodebuild test-without-building. Auto-built from bundled source on first use.
Android: adb for interactions + a bundled snapshot helper APK (13 KB Java instrumentation) that provides multi-window accessibility snapshots. Auto-installed on first use.

No emulator images, test frameworks, or additional SDKs are required beyond Xcode (iOS) and Android SDK (Android).

CLI quickstart

# List all connected/booted devices
ad devices

# Capture the accessibility tree
ad snapshot --platform ios --serial <UDID>

# Interact
ad open com.example.myapp --platform android
ad tap 200 400
ad type "hello world"
ad click 'text="Submit"'
ad swipe 200 600 200 200

# Assertions
ad is visible 'text="Welcome"'
ad is hidden 'id=loadingSpinner'
ad wait visible 'text="Done"' --timeout 10000

# Replay an .ad script
ad replay flow.ad --platform ios

# Record video with chapter markers per step
ad replay flow.ad --record recording.mp4

Every command supports --json for machine-readable output and --verbose for diagnostic logging.

Library usage

import 'package:agent_device/agent_device.dart';

void main() async {
  // Open a session on a connected device
  final device = await AgentDevice.open(
    backend: const IosBackend(),
    selector: const DeviceSelector(serial: 'BOOTED-UDID'),
  );

  // Launch an app and capture the UI tree
  await device.openApp('com.example.myapp');
  final snap = await device.snapshot();
  for (final node in (snap.nodes ?? []).whereType<SnapshotNode>()) {
    print('@${node.ref} [${node.type}] ${node.label ?? ""}');
  }

  // Interact via selectors
  await device.tapTarget(
    InteractionTarget.selector('text="Sign In"'),
  );
  await device.typeText('user@example.com');

  // Assert visibility with viewport-aware checks
  final result = await device.isPredicate(
    'visible',
    InteractionTarget.selector('id=welcomeBanner'),
  );
  print('visible: ${result.pass}');

  // Record video with chapters (for test suites)
  final recorder = TestRecorder(device, '/tmp/test.mp4');
  await recorder.start();
  recorder.chapter('login flow');
  // ... test steps ...
  await recorder.stop(); // injects MP4 chapters via ffmpeg

  await device.close();
}

AgentDevice is a typed façade over the abstract Backend — instead of TS's dynamic bindCommands, Dart gets concrete methods on the façade and IosBackend / AndroidBackend subclasses fill in what each platform supports. Everything else inherits an UNSUPPORTED_OPERATION default so partial support is honest.

Key classes

Class	Purpose
`AgentDevice`	Main facade — open sessions, capture snapshots, interact, assert
`IosBackend` / `AndroidBackend`	Platform implementations
`DeviceSelector`	Filter devices by serial, name, or platform
`InteractionTarget`	Target a node by `@ref`, selector expression, or x/y coordinates
`TestRecorder`	Record video with chapter markers in Dart test files
`BackendSnapshotResult`	Snapshot result with typed node tree

Selector DSL

Target nodes using a concise selector language:

// By accessibility identifier
InteractionTarget.selector('id=loginButton')

// By label text (quote spaces)
InteractionTarget.selector('text="Sign In"')

// Compound selectors
InteractionTarget.selector('role=button text="Submit"')

// Fallback chains (try first, then second)
InteractionTarget.selector('id=submit || text="Submit"')

// By @ref from a previous snapshot
InteractionTarget.ref('@e5')

.ad replay scripts

Text-based scripts for repeatable UI flows:

context platform=ios
open com.example.myapp
snapshot -i
click 'text="Get Started"'
wait visible 'id=onboardingComplete' 10000
type "Jane Doe"
screenshot ./screenshots/onboarding.png

Run with ad replay flow.ad or ad test flows/ (runs all .ad files in a directory).

Supported actions in the replay runner: open, close, home, back, app-switcher, rotate, type, swipe, scroll, longpress, pinch, click/press/tap, fill, snapshot, screenshot, record start/stop, appstate. Selector-backed steps (click/fill/get/is/wait) can auto-heal with --replay-update: on failure a fresh snapshot is taken, the selector is re-resolved against the current tree, the step retried, and the script file rewritten with the healed selector.

Parameters

.ad scripts support ${VAR} interpolation in positional args, flag values, and runtime hints. Sources, in decreasing precedence:

agent-device replay -e KEY=VALUE (or --env KEY=VALUE, repeatable)
AD_VAR_* shell env (e.g. AD_VAR_APP=prod exposes ${APP})
File-local env KEY=VALUE directives at the top of the .ad file
Built-ins: ${AD_PLATFORM}, ${AD_SESSION}, ${AD_FILENAME}, ${AD_DEVICE}, ${AD_ARTIFACTS}

Use ${VAR:-default} for a fallback and \${...} to escape. Unresolved references fail loudly with file:line. The AD_* namespace is reserved — only built-ins may use it. replay --replay-update cannot yet round-trip env directives or interpolation tokens, so it refuses those scripts.

Native assets

The package bundles two native helpers that are managed automatically:

Android snapshot helper (13 KB APK) — provides multi-window accessibility snapshots via adb shell am instrument. Captures system UI (status bar, keyboard) alongside the app, unlike stock uiautomator dump. Auto-installed on the device on first snapshot.

iOS XCUITest runner (~200 KB source) — Swift project built via xcodebuild build-for-testing on first use. Provides snapshot, tap, swipe, type, record, and other interactions through an HTTP bridge to the simulator/device. Build output is cached in ios-runner/build/.

Both are resolved automatically — no manual build steps required.

Environment variables

Variable	Purpose
`AGENT_DEVICE_STATE_DIR`	Override state directory (default: `~/.agent-device/`)
`AGENT_DEVICE_VERBOSE`	Set to `1` for diagnostic logging
`AGENT_DEVICE_ANDROID_SNAPSHOT_DEBUG`	Set to `1` for Android snapshot diagnostics
`AGENT_DEVICE_IOS_RUNNER_DEBUG`	Set to `1` for iOS runner HTTP diagnostics
`AGENT_DEVICE_IOS_RUNNER_BUILD_DIR`	Override iOS runner build products path
`AD_RECORD_TESTS`	Set to a directory path to enable video recording in tests

Physical iOS device prerequisites

To drive a paired iPhone (--platform ios --serial <UDID>) the runner needs to be trusted on the device itself, one-time:

Enable Developer Mode — Settings → Privacy & Security → Developer Mode → On, then reboot the phone.
Trust the runner certificate — after the first xcodebuild build-for-testing -destination "generic/platform=iOS" installs AgentDeviceRunner.app, open it once from the home screen. You'll hit an "Untrusted Developer" sheet; go to Settings → General → VPN & Device Management, tap your developer profile, and trust it.
Keep the phone unlocked during test runs.

If any of those are missed you'll see a COMMAND_FAILED with hint "The UI test runner failed to enable automation mode …".

The runner is intentionally cached across CLI invocations (under~/.agent-device/ios-runners/<udid>.json) so subsequent commands skip the ~14s xcodebuild cold-start. To dismiss the on-device "Automation Running" overlay, run:

agent-device runner stop                  # active session's device
agent-device runner stop --serial <UDID>  # specific device
agent-device runner stop --all            # every cached runner

Every command takes --platform ios|android, --serial <udid|id>, --session <name>, and emits either human-readable text or --json. Session state (which device + which app) persists across invocations under ~/.agent-device/sessions/ so open in one shell and tap in another both land on the same device.

Supported features

Capability	Android	iOS simulator	iOS device (devicectl)
`devices`	✅	✅	✅
`snapshot` (accessibility tree)	✅	✅ (XCUITest runner)	✅
`screenshot` → PNG	✅	✅ (simctl)	✅
`tap` / `longpress` / `swipe`	✅	✅	✅
`fill` / `type` / `focus`	✅	✅	✅
`scroll` (direction + amount)	✅	✅	✅
`pinch` (scale + optional center)	❌ (runner gap)	✅	✅
`home` / `back` / `app-switcher`	✅	✅	✅
`rotate portrait \| landscape-…`	✅	✅	✅
`open <app>` / `close [app]`	✅	✅ (simctl)	✅ (devicectl)
`apps` / `appstate`	✅	✅	✅ (apps only)
`clipboard` get / `--set <text>`	✅	✅ (simctl pbpaste/pbcopy)	❌
`press` / `find` / `get` / `is` / `wait` — selector/@ref targeting	✅	✅	✅
`ensure-simulator <name>`	n/a	✅	n/a
`logs --since 30s --out <path>` (one-shot)	✅ (logcat -T)	✅ (simctl log show)	❌ (use `--stream` instead — Apple has no host-side `log show` for devices)
`logs --stream --out <path>` / `logs --stop`	✅ (logcat --pid + cross-invocation PID cache)	✅ (simctl log stream predicate)	✅ (idevicesyslog via libimobiledevice)
`record start` / `record stop`	✅ (screenrecord + pull)	✅ (XCUITest runner + sandbox pull)	✅ (runner + `devicectl copy from` — needs device trust + Developer Mode)
`perf [--metric cpu\|memory]`	✅ (dumpsys)	✅ (simctl spawn ps)	✅ (xctrace 2× 1s + delta — true CPU%)
`network <logPath>` (HTTP from logs)	✅ (cross-line Android enrichment)	✅	✅
`install` / `uninstall` / `reinstall`	✅ (apk + aab)	✅ (.app + .ipa)	✅ (.app + .ipa via devicectl — needs signed bundle)
`replay <script.ad>` / `test <glob>`	✅	✅	✅
Self-healing replay (`--replay-update`)	✅	✅	✅
Per-step artifacts + auto log-dump on failure	✅	✅	❌ (needs logs)

Architecture

bin/agent_device.dart                CLI entry point (dispatches to cli/run_cli.dart)
│
├── lib/src/cli/                     args-backed commands
│   ├── commands/*.dart              one file per top-level command
│   └── run_cli.dart                 CommandRunner wiring + buildCliRunner()
│
├── lib/src/runtime/                 typed façade (library API) + session store
│   ├── agent_device.dart            AgentDevice.open(...) and per-action methods
│   ├── file_session_store.dart      ~/.agent-device/sessions/<name>.json
│   └── paths.dart                   state-dir resolution
│
├── lib/src/replay/                  .ad script layer
│   ├── script.dart                  parser + serializer + context header
│   ├── replay_runtime.dart          dispatch table + heal + artifact dumper
│   └── heal.dart                    selector re-resolution
│
├── lib/src/selectors/               @ref + DSL (`id=foo`, `role=Button label="OK"`)
├── lib/src/snapshot/                accessibility-tree types + ref attach
├── lib/src/diagnostics/             log_stream_record + network_log (HTTP extractor)
├── lib/src/backend/                 abstract Backend + options / results / capabilities
│
├── lib/src/platforms/android/       adb + screenrecord + logcat + snapshot/input
│                                    + apk/aab install_artifact
│
└── lib/src/platforms/ios/
    ├── runner_client.dart           XCUITest bridge (HTTP POST /command, BSD socket
    │                                on physical devices over CoreDevice tunnel)
    ├── ios_backend.dart             Backend subclass (simctl + runner + devicectl)
    ├── devicectl.dart               physical-device list / launch / install / uninstall
    ├── simctl.dart                  buildSimctlArgs helper
    ├── ensure_simulator.dart        find-or-create + boot
    ├── install_artifact.dart        .app + .ipa (single-app or hint-resolved multi-app)
    ├── app_lifecycle.dart, devices.dart, perf.dart, screenshot.dart

The XCUITest runner itself lives at ios-runner/AgentDeviceRunner/ — a small Swift project Dart shells out to via xcodebuild test-without-building. See RunnerBSDSocketServer.swift / RunnerTests+CommandExecution.swift for the on-device side.

Key design choices vs. the TS source:

No long-lived daemon. TS spawns one for state sharing; Dart instead persists sessions to disk (~/.agent-device/sessions/), plus the iOS XCUITest runner (~/.agent-device/ios-runners/<udid>.json), Android screenrecord PID (~/.agent-device/android-recorders/<serial>.json), and live log streams (~/.agent-device/log-streams/<deviceId>.json) so CLI invocations and the user's shell both converge on the same underlying tooling.
No dynamic bindCommands. AgentDevice exposes concrete typed methods; each Backend subclass overrides what it supports — the base class's unsupported(...) makes partial coverage honest.
iOS physical devices route over the CoreDevice IPv6 tunnel (xcrun devicectl device info details → tunnelIPAddress), not the legacy usbmuxd/iproxy path Apple deprecated on iOS 17+.

Testing

# Unit tests (fast, no device required):
dart test packages/agent_device/test \
  --exclude-tags='android-live,ios-live,ios-device-live,android-emulator,fixture-live'

# iOS simulator live suite (needs a booted simulator + built runner):
AGENT_DEVICE_IOS_LIVE=1 dart test --tags=ios-live

# iOS physical device suite (also needs AGENT_DEVICE_IOS_DEVICE_UDID=<udid>):
AGENT_DEVICE_IOS_LIVE=1 AGENT_DEVICE_IOS_DEVICE_UDID=<udid> \
  dart test --tags=ios-device-live

# Android live suite (booted emulator or connected device):
AGENT_DEVICE_ANDROID_LIVE=1 dart test --tags=android-live

# All checks (analyze + unit tests):
make check

Test tags in use: android-live, android-emulator, ios-live, ios-device-live, fixture-live. Tests without a tag never need a device.

License

MIT