cockpit
cockpit is the host-side package for flutter_cockpit.
It provides:
- AI-first CLI commands
- an MCP server with the same workflows
- target-first entrypoints for non-Flutter, native, and host-level control
- task bundle writing and validation
- workspace tooling for search, package inspection, project creation, analyze, format, test, and fixes
Install
Requires Dart 3.8.0 or newer. Use a Flutter 3.32.0+ SDK when running it from a Flutter workspace.
dev_dependencies:
cockpit: ^1.0.0
Optional global activation:
dart pub global activate cockpit
cockpit --help
cockpit_mcp
cockpit_mcp is the global MCP launcher exposed by this package. If you do not need a global command, you can also run MCP directly with:
dart run cockpit serve-mcp
Toolchain resolution:
- Explicit executable variables win:
DART,DART_BIN,FLUTTER,FLUTTER_BIN. - SDK root variables are supported next:
DART_ROOT,DART_SDK,FLUTTER_ROOT,FLUTTER_SDK. - If only
FLUTTER_ROOTorFLUTTER_SDKis set, Dart commands use the bundled Flutter Dart SDK. - If none are set, Dart commands prefer the current Dart SDK executable before falling back to
dartonPATH; Flutter commands prefer the Flutter SDK around the current bundled Dart executable before falling back toflutteronPATH.
Typical host setup:
- Codex:
codex mcp add flutterCockpit -- dart run cockpit serve-mcp - Claude Code:
claude mcp add --transport stdio flutter-cockpit -- dart run cockpit serve-mcp - Cursor:
add a
flutter-cockpitstdio server in~/.cursor/mcp.jsonor.cursor/mcp.json - VS Code:
add a stdio server in
.vscode/mcp.jsonor your profilemcp.jsonunder"servers" - OpenCode:
add a local MCP entry in
~/.config/opencode/opencode.jsonor repo-localopencode.jsonunder"mcp"
For the fuller host-specific setup guide, see the repository README section:
A copyable generic MCP config is also shipped at
example/mcp_config.json.
CLI
dart run cockpit --help
dart run cockpit run-command --help
Recommended app-first loop:
launch-appread-app --profile minimalrun-commandorrun-batchinspect-ui,read-network,read-errors,read-logs,wait-idlewhen neededhot-reloadorhot-restartrun-script,run-task, orvalidate-taskfor delivery
For full-fidelity observability of delivery runs, point the local dashboard at the same output root used by the run:
dart run cockpit devtools --history-root /tmp/flutter_cockpit/out
The command stays running until interrupted and serves only on loopback by
default. Use CLI/MCP summaries for low-token agent decisions; use the dashboard
when timeline, screenshots, recordings, or bundle files need human inspection.
Runs are grouped by workflow sessionId. Treat sessionId as the isolated
development or validation job, taskId as the current objective, and runId as
one execution attempt. Reuse a sessionId for retries of the same job and use a
new sessionId for unrelated work. The dashboard opens the current latest scope
and pins the URL to that concrete scope, with a selector for older sessions or
all runs when cross-session audit is intentional. Pass --scope latest only
when you want it to keep following the newest job. scope=current and
scope=latest API URLs resolve to the current latest scope, while the UI
distinguishes pinned scope from following latest. Timelines render the
active session/scope across its runs in execution order; run details and bundle
panels track the selected run. Artifact links include the owning run and event
key so repeated relative paths stay traceable. The dashboard can also parse
workflow YAML/JSON and submit runScript or validateTask payloads as
background jobs under the same history root. Board-submitted runs need the
executable envelope that CLI normally supplies, such as sessionHandle,
baseUrl, outputRoot, and platform ids; keep those fields when switching
between JSON and YAML instead of pasting only the inner workflow. In-flight
submitted jobs remain visible before their live history files are written, and
completed submitted jobs expose bundle summaries and artifacts through the same
run API when the bundle remains under the history root. Run lists are paged for
long-lived history roots while scope totals remain visible.
Large or partially written bundle JSON is reported through summaryFileIssues
instead of failing the dashboard. The run detail panel exposes download bundle
through GET /api/runs/<runId>/bundle-download; the response is a
token-protected streamed tar with download_manifest.json, run_metadata.json,
bundle/**, and live/**, plus missingRoots for live-only or partial runs.
Target-first loop when the agent needs direct system or non-Flutter control:
launch-targetread-target --profile minimalinspect-surfaceorrun-shellwhen the resolved platform truthfully exposes shell controlread-task-bundle-summaryorvalidate-taskfor bundle-backed delivery review
Native/System Control Plane when Flutter semantics cannot control the required surface:
read-system-capabilities --platform <platform> ...- run only actions reported as
availablewithrun-system-action - use the returned
parameterscontract instead of guessing payload keys - use direct flags for common setup:
--appearance,--content-size,--font-scale,--latitude/--longitude,--orientation,--network-speed,--network-delay, status-bar flags, and--max-depth/--max-nodes; use--app-path,--grant-permissions,--keep-data,--source-path, and--destination-pathfor app/package and file/media setup; use--name,--purpose,--mode,--layer, and--output-pathfor system screenshots and recordings - read post-action app, target, or system state before judging the result
Use scene-level macros for real debugging blockers instead of composing many
low-level actions by hand: resolveBlockers handles common dialogs, keyboard
and system UI blockers, and app recovery; preparePermissions batches
permission grant/revoke/reset; recoverToApp brings the app foreground without
killing data; tapNotification expands the notification surface, matches
title/body/tag/text, and taps the notification; stabilizeForScreenshot
collapses noisy system state, dismisses keyboard when available, fixes
orientation/appearance/status bar where supported, and recovers the app;
readFocusState reports keyboard/focus state for blocker diagnosis.
When .dart_tool/flutter_cockpit/latest_app.json exists, system commands reuse
its platform, device id, process id, and platform app id. iOS simulator
permissions should prefer grantPermission, which uses deterministic
simctl privacy grant. For iOS simulator native UI or system dialogs that
Flutter semantics and simctl cannot handle, run WebDriverAgent separately.
Cockpit probes http://127.0.0.1:8100 by default for iOS simulator sessions;
pass --wda-url or set FLUTTER_COCKPIT_IOS_WDA_URL only for a custom
endpoint. Native actions stay blocked unless the endpoint is reachable. When it
is reachable, tap, longPress, drag, typeText, pressKey,
dismissSystemDialog, dismissKeyboard, expandNotifications,
expandQuickSettings, collapseSystemUi, tapNotification, resolveBlockers,
setOrientation, readFocusState, and readUiTree can be reported as
available and executed with run-system-action.
Simulator support is intentionally capability-truthful:
- Android emulator uses
adbfor native tap/drag/text/key input, Back/Home, volume keys, app install/uninstall/launch/terminate/data clear, permission grant/revoke/reset, URL/settings entry, appearance, text scale, location, orientation, emulator network speed/delay, notification shade, quick settings, system UI collapse, SystemUI demo-mode status bar overrides (setStatusBar/clearStatusBar), shell notifications, file push/pull, media import with media scanning, screenshots, recordings, UI tree dumps, process/window/system state reads, device info reads, notification state reads, logcat tails (readSystemLogs), battery simulation (setBattery), connectivity toggles (setConnectivity), and bounded shell commands.dismissSystemDialog --decision accept|dismissfirst tries common Android permission/system dialog buttons with UIAutomator;dismisscan fall back to Back. Notification taps use notification expansion plus UIAutomator text matching. - iOS simulator uses
simctlfor app install/uninstall/launch/terminate/data clear, privacy grant/revoke/reset, URL and Settings entry, appearance, content size, location, status bar overrides, pasteboard, simulated APNS pushes, app container push/pull, media import, screenshots, recordings, process reads, simulator/device info reads, locale switching (setLocale, relaunch the app afterwards), unified log reads (readSystemLogs), and boundedsimctl spawncommands. - iOS simulator native UI actions require a reachable WebDriverAgent endpoint: tap, long press, drag, text/key input, Home, keyboard dismissal, system dialog accept/dismiss, notification center, Control Center, notification taps, orientation, focus reads, and native UI tree reads.
- iOS simulator volume keys and clear-all-notifications have no stable public
simctl/XCTest simulator API. They remainunsupportedorblockedinstead of pretending to be automated. Use returned fallbacks, WDA-backed actions when available, or app-level assertions. - Desktop hosts (macOS/Windows/Linux) expose host-plane actions through
built-in tooling: URL and system settings entry (
open x-apple.systempreferences:/ms-settings:), host appearance (System Events / registry / gsettings), clipboard, host file push/pull and media copy, app activation/recovery/termination, focus and device/system reads, host system log reads (readSystemLogsvialog show,journalctl, orGet-WinEvent), process/window lists, notifications (osascript display notification/notify-send), macOS TCCresetPermissionviatccutil, window-targeted input, native UI tree reads (macOS/Windows), and window screenshots and recordings. Host-global surfaces with no stable app-scoped tooling — Home, volume keys, status bar, notification-center expansion, simulated location, orientation — stayunsupportedorblockedtruthfully. - Web (browser) targets keep DOM-plane input blocked until a browser driver or bridge is configured, but screenshots and recordings are available through the host window adapters when the browser app id or process id is known (macOS hosts require the app id).
Default AI-readable capability rows include compact parameter metadata such as
parameters=[x*:integer | wifiBars:integer[0..3] | appearance*:string(light|dark)].
JSON output includes the same contract as structured parameters entries with
required, valueType, allowedValues, minimum, and maximum.
It also includes actionGroups, so agents can discover all available
permission, notification, file, media, evidence, device-state, and inspection
actions without hard-coding platform-specific action names.
Recommended code-side loop:
analyze-files --path ...lsp --command ...grep-package-urisorread-package-urispub-dev-searchorpubrun-testsoranalyze-workspaceonly when the question is no longer local
CLI JSON output uses lower camel case keys.
If launch-app omits --app-json, it persists the current app handle at .dart_tool/flutter_cockpit/latest_app.json in the working directory and later app commands reuse it automatically.
launch-app is intentionally a short command: it waits for the app to become ready, writes the handle, and exits. In development mode, a background supervisor keeps flutter run --machine, logs, hot reload, hot restart, and stop-app control alive, so agents should not run launch-app with shell backgrounding.
run-shell is bounded and killable by default. Keep the default timeout for quick probes; pass --timeout-seconds <n> only for known-slow shell work.
When a command accepts both --app-json and --base-url, precedence is: explicit --app-json, then explicit --base-url, then the implicit .dart_tool/flutter_cockpit/latest_app.json handle in the current working directory.
launch-app auto-detects cockpit/main.dart first, then lib/main.dart.
run-script --script <workflow.yaml|script.json> accepts YAML or JSON scripts.
Use YAML for hand-written if, retry, bounded loop, and
startRecording / stopRecording workflows, and JSON for generated scripts.
The protocol map is shipped with this package at
doc/contracts/flutter-cockpit-protocol.md.
The AI development protocol is shipped with this package
at doc/contracts/ai-development-protocol.md.
The workflow protocol is shipped at
doc/contracts/control-workflow-protocol.md
with the machine schema at
doc/contracts/control-workflow.schema.json.
run-script and run-remote-control-script exit non-zero when the written bundle status is failed.
Workspace commands default --workspace-root or --parent-directory to the current directory.
Serialize mutation, then observation. Do not run a mutating run-command in parallel with the read-app, inspect-ui, or read-network call that depends on its result.
When the next few steps are already known and the flow will cross a route boundary such as list -> editor -> list, prefer one ordered run-batch over separate run-command round-trips. It cuts token cost and avoids route-transition gaps between commands.
read-app and snapshots expose focus state. When uiSummary.focus.isTextInputFocus is true or a software keyboard covers the next target, run dismissKeyboard as a locator-free command before scrolling or tapping again.
Use product-specific locator signals. Short repeated action labels such as Open, Edit, or Save are fine as fallbacks, but they should not be the only signal when multiple rows or cards can expose the same word. Prefer the full accessible label from read-app / inspect-ui, then add route, type, or ancestor only as needed.
For route-changing tap commands, set parameters.expectedRouteName. Add parameters.routeTimeoutMs for CI, recording, simulator, or other acceptance flows where runner latency is expected; timeoutMs remains the hard command ceiling. Follow critical route crossings with waitFor on parameters.routeName. To wait for spinners, dialogs, or routes to disappear, use waitFor with parameters.absent: true.
capture-screenshot uses app metadata when available and prefers system/host
capture before falling back to app capture with fallback metadata.
run-command, run-batch, and run-script default key mutating commands to
best-effort after-action screenshots attached to the command step. This gives
agents key-frame evidence for taps, text input, scrolls, drags, and back
navigation without adding per-command JSON. Use capture-screenshot for final
acceptance or any named proof artifact that must be strict.
For AI-first development, build project-owned rapid verifiers around the same small loop: launch, drive one representative flow, hot reload, assert the changed state, capture one still artifact when useful, read runtime errors, and stop the app. Keep failure JSON compact enough for an agent to inspect before opening full snapshots or rerunning expensive validation. Useful fields are completed phases, failed command metadata, final route or state preview, bounded runtime error previews, and artifact refs.
Minimal verified run-command shape:
dart run cockpit \
run-command \
--app-json /tmp/app.json \
--command-json '{"commandId":"assert-ready","commandType":"assertText","parameters":{"text":"<expected-text>"}}'
Verified web development loop:
dart run cockpit \
launch-app \
--project-dir <project-dir> \
--platform web \
--device-id chrome \
--app-json /tmp/flutter_cockpit/web_app.json
dart run cockpit \
read-app \
--app-json /tmp/flutter_cockpit/web_app.json \
--profile minimal
If a browser-backed session reports a real route but visibleTargetCount: 0,
rerun read-app --profile standard before assuming the app is broken. The
result now surfaces recommendedNextStep: "recoverBrowserVisibility" when the
page looks backgrounded, throttled, or still reconnecting.
dart run cockpit \
hot-restart \
--app-json /tmp/flutter_cockpit/web_app.json
On web, launch-app now stands up a host-side bridge on 127.0.0.1 and lets the browser app connect back over WebSocket while keeping the existing HTTP app surface (/health, /snapshot, /commands/execute, /recording/*) stable for agents.
Host-side browser recording still depends on the desktop OS granting screen-capture permission to the browser and capture stack; when that host permission or device policy blocks capture, stop-recording returns a structured failure result instead of hanging the session.
For project-owned web validation, keep app control, screenshots, and reload checks strict. Treat missing desktop screen-capture permission as a structured environment warning only when the app-control path still passes.
Locator rules:
- Start with
text,tooltip, orsemanticId. - Use
keyonly when the app already exposes a legitimate stable key for product reasons. Do not add automation-only keys. - Add
route,type,path, and nestedancestoronly when ambiguity remains. pathis fuzzy: segments likebody,slivers, and numeric indexes are ignored, so shapes such asscaffold.body/custom_scroll_view.slivers/0/...can still match the same target.- Use
fallbacksfor a short ordered backup list instead of one oversized locator. scrollUntilVisibleprobes between internal scroll segments, so agents should prefer one good locator and tuneviewportFractionbefore falling back to manual repeated scroll commands.
Token-Saving Shell Patterns
When the host is a shell agent, prefer the CLI surface plus small jq projections:
dart run cockpit \
read-app \
--profile minimal --stdout-format json | jq '{currentRouteName,state}'
dart run cockpit \
validate-task \
--config /tmp/validate_task.yaml --stdout-format json | jq '{classification,recommendedNextStep,validationFailures}'
dart run cockpit \
validate-task \
--config /tmp/validate_task.yaml \
--output /tmp/validate_task_result.json \
--output-format json
Review an existing task-run bundle without reopening large raw artifacts:
dart run cockpit \
read-task-bundle-summary \
--bundle-dir /tmp/flutter_cockpit/out/20260530T060304005006Z_session-1
Default stdout is the full AI-readable semantic render. Add --stdout-format json for immediate jq projections. Keep larger payloads on disk with --output <path>; add --output-format json only when a later step must reopen structured JSON. Prefer --command-file, --commands-file, or --config over long inline JSON once the request body stops being trivial; use YAML for hand-written task/workflow configs and JSON for generated configs.
MCP
dart run cockpit serve-mcp
Recommended app and target tools:
list_targetslaunch_applaunch_targetlist_appsread_appread_targetinspect_uiinspect_surfacerun_commandrun_batchcapture_screenshotread_system_capabilitiesrun_system_actionrun_shellwait_idlehot_reloadhot_restartstart_recordingstop_recordingread_networkread_logsread_errorsstop_apprun_scriptread_task_bundle_summaryrun_taskvalidate_task
Advanced remote-session tools:
list_active_sessionslaunch_remote_sessionquery_remote_sessionread_remote_statusread_remote_snapshotcollect_remote_snapshotexecute_remote_commandexecute_remote_command_batchwait_remote_ui_idlestart_remote_recordingstop_remote_recording
Development-session tools:
launch_development_sessionquery_development_sessionreload_development_sessioncollect_development_probecompare_development_proberead_session_logsstop_development_session
list_apps, list_active_sessions, and read_session_logs are MCP-only by
design: they read the long-lived MCP server's in-process session registry. The
stateless CLI covers the same workflow through app handles —
.dart_tool/flutter_cockpit/latest_app.json plus explicit --app-json files —
so a CLI listing command would always report an empty registry.
Workspace and roots tools:
add_rootsremove_rootspub_dev_searchpubgrep_package_urisread_package_urislspanalyze_filescreate_projectanalyze_workspaceformat_workspacerun_testsapply_fixes
Resources and prompts are also exposed for contracts, capabilities, task
summaries, roots, package reads, and standard closed-loop guidance. Read
cockpit://workspace/protocol first for the contract map,
cockpit://workspace/ai-development-protocol for the AI development loop, and
cockpit://workspace/control-workflow-protocol when an MCP host needs the
script protocol for run_script. Use
cockpit://workspace/control-workflow-schema when a tool needs the
machine-readable workflow schema.
Notes
- Persist
app.jsonand reuse it. It is the preferred app reference across steps. - If you stay in one repo, the default
.dart_tool/flutter_cockpit/latest_app.jsonhandle is the lowest-friction path and usually removes the need to keep passing--app-json. - For apps wired for Cockpit, prefer the Cockpit development entrypoint such as
cockpit/main.dart; that is where network observation and the remote control surface are enabled. - If the app makes live HTTP calls, keep platform permissions aligned with that behavior: Android needs
INTERNET, and Apple targets need outbound client entitlement plus local-network ATS allowance for loopback HTTP. list_appsis MCP-only because the CLI does not keep an in-memory app registry across invocations.read_logsreads app-centric runtime lines first.available=truewith an emptylinesarray is valid when the app emitted no runtime logs.read_networkis the low-token path for endpoint summaries, recent failures, and optional bounded entries. Preferrun_command->wait_idle->read_networkoverinspect_uiwhen the question is only about network traffic.- On long pages, reveal a stable card or section first. If a deep control still misses under sticky chrome, lower
viewportFractionbefore escalating toinspect_ui. pubkeeps dependency edits bounded and returns previews instead of fullpublogs by default.- Shell agents usually get the lowest token cost from the CLI surface. Tool-calling hosts can use the matching MCP tools instead of reopening large command payloads in model context.
analyze_filesis the low-token path for focused diagnostics; useanalyze_workspaceonly when the question is workspace-wide.lspuses relative paths plus 1-based line and column inputs so agents do not need file URIs or zero-based math.- Use
minimal,standard,inspect, andevidenceprofiles to trade off token cost against detail. - Interactive app commands accept
timeoutMs. Workspace tools accepttimeoutSeconds. Keep the default unless the task is known to be slow. pub_dev_searchuses a bounded network path and a local Python fallback when direct TLS fetches fail on the host.- Advanced low-level session services still exist in the Dart API, but the recommended public loop is app-first.
- CLI
read-task-bundle-summary, MCPread_task_bundle_summary, andvalidate-taskexpose plane-aware delivery state, includingtargetKind,primaryExecutionPlane,planesUsed,surfaceKindsUsed,fallbackCount, and bounded fallback gates.
Verification
Repository-only MCP verification is maintained in the source tree at
packages/cockpit/tool/verify_mcp_surface.dart.
It exercises the real serve-mcp stdio surface, workspace tooling,
target-first commands, and delivery tools end to end.
The repository runtime-loop workflow runs it on macOS as the MCP and
target-first release gate:
runtime-loop.yml.
Package page: pub.dev/packages/cockpit