ai_flutter_agent 0.1.3 copy "ai_flutter_agent: ^0.1.3" to clipboard
ai_flutter_agent: ^0.1.3 copied to clipboard

A Flutter package that lets LLMs operate app UIs via the Semantics tree. Perceive → Plan → Execute → Verify loop with built-in safety features.

Flutter Agent icon

ai_flutter_agent

Let LLMs operate Flutter app UIs through the Semantics tree.
Perceive → Plan → Execute → Verify — a complete agent loop with built-in safety.

Tests Pub Version License Flutter Dart


🎬 Demo #

An LLM autonomously operating a Todo app — adding items, typing text, and toggling checkboxes. No coordinates. No pixel matching. Pure semantic understanding.

The agent reads the Semantics tree, plans actions via LLM tool-calls, and executes them — all without knowing a single pixel coordinate.


🧠 This is NOT Screen-Coordinate Automation #

Most "AI agents" for mobile apps work by taking a screenshot → asking an LLM to identify pixel coordinates → clicking those coordinates. This approach is:

  • Fragile — a slight layout change breaks everything
  • Slow — sending full screenshots to vision models is expensive
  • Resolution-dependent — coordinates differ across devices
  • Language-dependent — visual OCR fails with different locales

ai_flutter_agent takes a fundamentally different approach:

  • Reads Flutter's Semantics tree directly — the same accessibility tree used by screen readers
  • Understands UI structure, not pixels — knows that node #42 is a "checkbox" with label "Buy groceries", not "a blue square at (127, 340)"
  • Resolution-independent — works identically on any screen size or density
  • Blazing fast — sends a lightweight text tree to the LLM instead of a multi-MB screenshot
  • Leverages existing accessibility annotations — if your app is accessible, the agent can use it
📸 Screenshot approach:       🌳 Semantics approach (ours):
"Click at (127, 340)"         "Tap node #42 (checkbox: Buy groceries)"
"Type at (200, 100)"          "setText on node #15 (textField: New todo)"

Think of it this way: other agents are blind — they see pixels. Our agent reads — it understands your UI.


What is ai_flutter_agent? #

ai_flutter_agent is a Dart/Flutter package that bridges Large Language Models and Flutter UIs. It captures the live Semantics tree, sends it to an LLM, executes the returned tool-call actions, and verifies the UI changed — all in an automated loop.

Use cases:

  • 🤖 AI-powered UI testing — let an LLM explore and test your app
  • ♿ Accessibility automation — leverage the Semantics tree for smart interactions
  • 🔄 Macro recording & replay — capture user flows and re-execute them
  • 🧪 E2E testing without brittle selectors — the LLM understands your UI

Installation #

Add to your pubspec.yaml:

dependencies:
  ai_flutter_agent: ^0.1.3

Or run:

flutter pub add ai_flutter_agent

Architecture #

┌─────────────────────────────────────────────────┐
│                   AgentCore                      │
│                                                  │
│  1. Perceive   SemanticTreeWalker.capture()       │
│       ↓        → WidgetDescriptor tree           │
│  2. Plan       Planner.plan()                    │
│       ↓        → LLMClient.requestActions()      │
│  3. Execute    Executor.executeAll()              │
│       ↓        → ActionRegistry (whitelist)       │
│  4. Verify     Verifier.verify()                  │
│       ↓        → VerificationDetail (diff)        │
│  (unchanged? retry up to maxRetries, then error)  │
└─────────────────────────────────────────────────┘

Quick Start #

import 'package:ai_flutter_agent/ai_flutter_agent.dart';

// 1. Wrap your app to enable Semantics
runApp(
  AgentOverlayWidget(
    enabled: true,
    child: MyApp(),
  ),
);

// 2. Register actions
final registry = ActionRegistry();
BuiltInActions.registerDefaults(registry);

// 3. Set up LLM client (OpenAI-compatible)
final llm = OpenAILLMClient(
  apiKey: 'your-api-key',  // or use env var
  model: 'gpt-4o',
  // baseUrl: 'http://localhost:1234/v1', // for local models
);

// 4. Build components
final auditLog = AuditLog();
final planner = Planner(llmClient: llm, actionRegistry: registry);
final executor = Executor(actionRegistry: registry, auditLog: auditLog);
final verifier = Verifier(treeWalker: SemanticTreeWalker());

// 5. Create and run agent
final agent = AgentCore(
  config: AgentConfig(maxSteps: 10),
  treeWalker: SemanticTreeWalker(),
  planner: planner,
  executor: executor,
  verifier: verifier,
);

await agent.run('Fill in the login form and tap Submit');

// Check results
print(agent.state.status);       // AgentStatus.completed
print(auditLog.entries.length);  // number of actions executed

Key Features #

Category Feature Class Description
Core UI Perception SemanticTreeWalker Captures live semantics tree as WidgetDescriptor
Node Resolution NodeResolver + Selector Find nodes by id, label, role, or key
Action Registry ActionRegistry Whitelist of allowed actions with OpenAI tool schemas
Built-in Actions BuiltInActions tap, longPress, scroll, setText, focus, dismiss
LLM OpenAI Client OpenAILLMClient HTTP-based, supports any OpenAI-compatible endpoint
Streaming StreamingLLMClient Stream-based LLM responses
Isolate Execution IsolateLLMClient Run LLM calls off the main thread
Conversation History ConversationHistory Multi-turn context with FIFO eviction
Retry RetryExecutor Exponential backoff for resilient LLM calls
Safety Privacy Masking SensitiveDataMasker Strip emails, phones, credit cards before LLM
Consent Gate ConsentHandler User approval before executing actions
Action Timeout Executor Per-action timeout enforcement
Action Confirmation Executor Per-action confirmation callbacks
Audit Log AuditLog Every action recorded (success + failure)
DX Prompt Templates PromptTemplate Customizable prompt formatting
Verification Diff VerificationDetail Structured tree diff for change detection
Macro Recording MacroRecorder Record & replay action sequences with serialization
Debug Events DebugLogStream Stream events for debug overlay
Lifecycle Hooks AgentCallbacks onStepStart, onActionExecuted, onComplete, onError
Widget Wrapper AgentOverlayWidget Manages semantics lifecycle automatically

Advanced Usage #

Custom Prompt Template #

final planner = Planner(
  llmClient: llm,
  actionRegistry: registry,
  promptTemplate: CustomPromptTemplate(
    template: 'UI:\n{ui}\n\nTask: {task}\n\nTools: {actions}',
  ),
);

Privacy-Aware Agent #

final agent = AgentCore(
  config: AgentConfig(maxSteps: 10),
  treeWalker: SemanticTreeWalker(),
  planner: planner,
  executor: executor,
  verifier: verifier,
  sensitiveDataMasker: SensitiveDataMasker(), // strips PII automatically
  consentHandler: ConsentHandler(
    onConsentRequired: (actions) async => true, // your approval logic
  ),
);

Local LLM Support #

Works with any OpenAI-compatible endpoint — LM Studio, Ollama, vLLM, etc.:

final llm = OpenAILLMClient(
  apiKey: 'not-needed',
  model: 'your-local-model',
  baseUrl: 'http://localhost:1234/v1',
);

Requirements #

  • Flutter ≥ 3.22.0
  • Dart ≥ 3.4.0
  • Your app widgets must have Semantics annotations for the agent to perceive them

Testing #

flutter test           # 181 tests
flutter analyze        # Static analysis

License #

BSD 3-Clause

0
likes
150
points
189
downloads

Documentation

API reference

Publisher

unverified uploader

Weekly Downloads

A Flutter package that lets LLMs operate app UIs via the Semantics tree. Perceive → Plan → Execute → Verify loop with built-in safety features.

Repository (GitHub)
View/report issues

Topics

#ai #llm #semantics #accessibility #agent

License

BSD-3-Clause (license)

Dependencies

flutter, http

More

Packages that depend on ai_flutter_agent