A policy-gated tool runtime for in-app LLM/agent tools. Per-tool auto / confirm / deny gate, ephemeral grants, preview, audit, undo — and a structured user-input primitive for handlers that need the user to fill in missing arguments mid-call. Provider-neutral; usable with any LLM.
ToolCall ──► preview ──► policy gate ──► validate ──► handler ──► audit ledger
│ │ │ │ │
│ auto / confirm / deny │ │ └── undo hook
│ + ephemeral grants │ │
│ │ └── pendingInputs → render form
│ │ → merge args → re-invoke
│ └── structured ToolResult
└── pure, no side effect
Not the MCP wire protocol. Despite the name, this package does not speak JSON-RPC / stdio / SSE — see
mcp_server/mcp_clientfor that.in_app_mcpis a local, in-process runtime focused on the authorization boundary between a model's tool call and your app's side effects.
Install
dependencies:
in_app_mcp: ^1.3.0
Quick start
import 'package:in_app_mcp/in_app_mcp.dart';
final mcp = InAppMcp(defaultPolicy: ToolPolicy.confirm);
mcp.registerTool(
definition: const ToolDefinition(
name: 'echo',
description: 'Echo message back',
argumentTypes: {'message': ToolArgType.string},
requiredArguments: {'message'},
allowAdditionalArguments: false,
),
handler: (call) async =>
ToolResult.ok('ok', data: {'echo': call.arguments['message']}),
);
final result = await mcp.handleToolCall(
const ToolCall(id: '1', toolName: 'echo', arguments: {'message': 'hello'}),
confirmed: true,
);
Runtime flow
- Adapter produces
ToolCall - Policy resolves for
toolName(activeEphemeralGrants are consumed) InvocationInterceptor.onResolvePolicycan override- Deny →
policy_denied; confirm-required withoutconfirmed: true→confirmation_required beforeExecutecan veto (first-wins)- Registry validates arguments (types, requireds)
- Handler runs and returns
ToolResult— includingToolResult.requiresInput(...)to pause for user input afterExecutecan rewriteAuditLedgerrecords;onAuditfans out
Structured user input
Pause mid-call and ask the user for missing arguments without a side channel. Declare the optional args as non-required, then:
Future<ToolResult> execute(ToolCall call) async {
if (call.arguments['destination'] == null) {
return ToolResult.requiresInput(
requests: const [
UserInputRequest(
id: 'destination',
kind: 'text', // 'text' | 'single_choice' | 'photos' | 'location' | custom
field: 'destination',
prompt: 'Where are you travelling to?',
parameters: {'placeholder': 'e.g. Tokyo'},
),
],
);
}
return ToolResult.ok('Booked ${call.arguments['destination']}.');
}
The host reads result.pendingInputs, renders a widget per kind (apps register their own; the example ships text / single_choice / number), collects values keyed by field, merges them into ToolCall.arguments, and calls handleToolCall again. kind is free-form on purpose — add domain-specific kinds ('certificate', 'location_autocomplete') as needed. Pending-input rounds flow through afterExecute and into the audit ledger like any other outcome.
See example/lib/agent_tools/book_trip_tool.dart for a four-field round-trip.
Interceptors
Plug into the pipeline without implementing a whole PolicyStore / GrantStore / AuditLedger:
class RateLimiter extends InvocationInterceptor {
final Map<String, DateTime> _lastCall = {};
@override
Future<ToolResult?> beforeExecute(ToolCall call, ResolvedPolicy _) async {
final now = DateTime.now();
final last = _lastCall[call.toolName];
_lastCall[call.toolName] = now;
if (last != null && now.difference(last).inSeconds < 5) {
return ToolResult.fail('rate_limited', 'Try again in 5 seconds.');
}
return null;
}
}
final mcp = InAppMcp(interceptors: [RateLimiter()]);
| Hook | Fires when | Return non-null to… |
|---|---|---|
onResolvePolicy |
after PolicyEngine decides |
override the decision (chain-through) |
beforeExecute |
after policy allows, before handler | veto with a failure (first-wins) |
afterExecute |
after handler returns | rewrite result — e.g. redact PII (chain-through) |
onAudit |
after ledger records | fan-out telemetry; exceptions swallowed |
onResolvePolicy / beforeExecute / afterExecute propagate exceptions and fail the call. Only onAudit swallows errors.
Showcase
All cards below are real frames from example/integration_test/ driving Gemma 4 E2B on-device on an iPhone simulator. Prompts are natural language — no tool names, no schemas.
Tool-call proposals
![]() "Wake me up at 6 AM every weekday." |
![]() "Put a Team Sync meeting on my calendar tomorrow 10–11 AM at Main Office." |
![]() "How do I drive to Tokyo?" |
![]() "Draft an email to team@example.com saying hello." |
![]() "Echo 'hello from showcase'." (codegen `@McpTool`) |
Each card shows the tool icon, description, status chip, policy chip, and proposed arguments. Nothing executes until the user taps Run.
Why the policy gate matters: Gemma sometimes fills placeholders like
startIso: "<tomorrow's date>T10:00:00"instead of a resolved timestamp. The card surfaces proposed arguments before the handler runs, so the user can catch it.
Consent Lifecycle (single prompt, end-to-end)
Driving "Echo back the phrase 'hello from showcase'" through the four lifecycle layers:
![]() Preview — `@McpToolPreview` summarises before Run. |
![]() Grant submenu — once / 5 min / session. |
![]() Succeeded + Undo — ledger entry attaches and reveals the Undo button. |
![]() Undone — `@McpToolUndo` runs, ledger entry marked undone. |
![]() Audit timeline — every outcome listed with per-entry undo. |
On-device Gemma (iOS simulator)
cd example
./scripts/precache_gemma_e2b.sh # one-time model cache
flutter run -d <booted-simulator-id> \
--dart-define=LLM_ADAPTER=gemma \
--dart-define=GEMMA_MODEL_PATH=$PWD/model_cache/gemma-4-E2B-it.litertlm
Integration tests under example/integration_test/ (screenshot-producing showcases + gemma_book_trip_flow_test.dart for the pending-input flow) use the same --dart-defines; capture_consent_showcase.sh also drives xcrun simctl io booted screenshot. The deterministic book-trip flow runs without a real model via --dart-define=E2E_MODE=true --dart-define=GEMMA_MODEL_PATH=fake.
Public API
Models: ToolCall, ToolDefinition + ToolArgType, ToolResult (incl. pendingInputs, requiresUserInput), ToolErrorCode, Preview + PreviewWarning, UserInputRequest.
Runtime: InAppMcp (facade), ToolPolicy / PolicyDecision / PolicySource / ResolvedPolicy, PolicyStore + InMemoryPolicyStore, GrantStore + InMemoryGrantStore + EphemeralGrant, AuditLedger + InMemoryAuditLedger + AuditEntry, ToolRegistry + RegisteredTool + ToolPreviewer + ToolUndoer, InvocationEngine + InvocationInterceptor.
Error codes: tool_not_found, invalid_arguments, policy_denied, confirmation_required, requires_user_input, audit_disabled, entry_not_found, already_undone, nothing_to_undo, undo_not_supported.
Testing
# package
flutter analyze && flutter test
# example app
cd example && flutter analyze && flutter test
Security notes
- Don't hardcode API keys. Treat tool handlers as side-effect boundaries.
- Keep risky tools behind
confirmordeny. - OS-level permissions still apply where the platform requires them.
Docs
Libraries
- in_app_mcp
- In-app MCP-style tool execution runtime for Flutter with policy controls.









