in_app_mcp 1.3.0 copy "in_app_mcp: ^1.3.0" to clipboard
in_app_mcp: ^1.3.0 copied to clipboard

Policy-gated tool runtime for in-app LLM/agent tools. Per-tool auto/confirm/deny, typed contracts, structured results.

in_app_mcp banner

A policy-gated tool runtime for in-app LLM/agent tools. Per-tool auto / confirm / deny gate, ephemeral grants, preview, audit, undo — and a structured user-input primitive for handlers that need the user to fill in missing arguments mid-call. Provider-neutral; usable with any LLM.

ToolCall ──► preview ──► policy gate ──► validate ──► handler ──► audit ledger
               │            │                            │  │          │
               │            auto / confirm / deny        │  │          └── undo hook
               │            + ephemeral grants           │  │
               │                                         │  └── pendingInputs → render form
               │                                         │     → merge args → re-invoke
               │                                         └── structured ToolResult
               └── pure, no side effect

Not the MCP wire protocol. Despite the name, this package does not speak JSON-RPC / stdio / SSE — see mcp_server / mcp_client for that. in_app_mcp is a local, in-process runtime focused on the authorization boundary between a model's tool call and your app's side effects.

Install #

dependencies:
  in_app_mcp: ^1.3.0

Quick start #

import 'package:in_app_mcp/in_app_mcp.dart';

final mcp = InAppMcp(defaultPolicy: ToolPolicy.confirm);

mcp.registerTool(
  definition: const ToolDefinition(
    name: 'echo',
    description: 'Echo message back',
    argumentTypes: {'message': ToolArgType.string},
    requiredArguments: {'message'},
    allowAdditionalArguments: false,
  ),
  handler: (call) async =>
      ToolResult.ok('ok', data: {'echo': call.arguments['message']}),
);

final result = await mcp.handleToolCall(
  const ToolCall(id: '1', toolName: 'echo', arguments: {'message': 'hello'}),
  confirmed: true,
);

Runtime flow #

  1. Adapter produces ToolCall
  2. Policy resolves for toolName (active EphemeralGrants are consumed)
  3. InvocationInterceptor.onResolvePolicy can override
  4. Deny → policy_denied; confirm-required without confirmed: trueconfirmation_required
  5. beforeExecute can veto (first-wins)
  6. Registry validates arguments (types, requireds)
  7. Handler runs and returns ToolResult — including ToolResult.requiresInput(...) to pause for user input
  8. afterExecute can rewrite
  9. AuditLedger records; onAudit fans out

Structured user input #

Pause mid-call and ask the user for missing arguments without a side channel. Declare the optional args as non-required, then:

Future<ToolResult> execute(ToolCall call) async {
  if (call.arguments['destination'] == null) {
    return ToolResult.requiresInput(
      requests: const [
        UserInputRequest(
          id: 'destination',
          kind: 'text', // 'text' | 'single_choice' | 'photos' | 'location' | custom
          field: 'destination',
          prompt: 'Where are you travelling to?',
          parameters: {'placeholder': 'e.g. Tokyo'},
        ),
      ],
    );
  }
  return ToolResult.ok('Booked ${call.arguments['destination']}.');
}

The host reads result.pendingInputs, renders a widget per kind (apps register their own; the example ships text / single_choice / number), collects values keyed by field, merges them into ToolCall.arguments, and calls handleToolCall again. kind is free-form on purpose — add domain-specific kinds ('certificate', 'location_autocomplete') as needed. Pending-input rounds flow through afterExecute and into the audit ledger like any other outcome.

See example/lib/agent_tools/book_trip_tool.dart for a four-field round-trip.

Interceptors #

Plug into the pipeline without implementing a whole PolicyStore / GrantStore / AuditLedger:

class RateLimiter extends InvocationInterceptor {
  final Map<String, DateTime> _lastCall = {};
  @override
  Future<ToolResult?> beforeExecute(ToolCall call, ResolvedPolicy _) async {
    final now = DateTime.now();
    final last = _lastCall[call.toolName];
    _lastCall[call.toolName] = now;
    if (last != null && now.difference(last).inSeconds < 5) {
      return ToolResult.fail('rate_limited', 'Try again in 5 seconds.');
    }
    return null;
  }
}

final mcp = InAppMcp(interceptors: [RateLimiter()]);
Hook Fires when Return non-null to…
onResolvePolicy after PolicyEngine decides override the decision (chain-through)
beforeExecute after policy allows, before handler veto with a failure (first-wins)
afterExecute after handler returns rewrite result — e.g. redact PII (chain-through)
onAudit after ledger records fan-out telemetry; exceptions swallowed

onResolvePolicy / beforeExecute / afterExecute propagate exceptions and fail the call. Only onAudit swallows errors.

Showcase #

All cards below are real frames from example/integration_test/ driving Gemma 4 E2B on-device on an iPhone simulator. Prompts are natural language — no tool names, no schemas.

Tool-call proposals #


"Wake me up at 6 AM every weekday."

"Put a Team Sync meeting on my calendar tomorrow 10–11 AM at Main Office."

"How do I drive to Tokyo?"

"Draft an email to team@example.com saying hello."

"Echo 'hello from showcase'." (codegen `@McpTool`)

Each card shows the tool icon, description, status chip, policy chip, and proposed arguments. Nothing executes until the user taps Run.

Why the policy gate matters: Gemma sometimes fills placeholders like startIso: "<tomorrow's date>T10:00:00" instead of a resolved timestamp. The card surfaces proposed arguments before the handler runs, so the user can catch it.

Driving "Echo back the phrase 'hello from showcase'" through the four lifecycle layers:


Preview — `@McpToolPreview` summarises before Run.

Grant submenu — once / 5 min / session.

Succeeded + Undo — ledger entry attaches and reveals the Undo button.

Undone — `@McpToolUndo` runs, ledger entry marked undone.

Audit timeline — every outcome listed with per-entry undo.

On-device Gemma (iOS simulator) #

cd example
./scripts/precache_gemma_e2b.sh         # one-time model cache

flutter run -d <booted-simulator-id> \
  --dart-define=LLM_ADAPTER=gemma \
  --dart-define=GEMMA_MODEL_PATH=$PWD/model_cache/gemma-4-E2B-it.litertlm

Integration tests under example/integration_test/ (screenshot-producing showcases + gemma_book_trip_flow_test.dart for the pending-input flow) use the same --dart-defines; capture_consent_showcase.sh also drives xcrun simctl io booted screenshot. The deterministic book-trip flow runs without a real model via --dart-define=E2E_MODE=true --dart-define=GEMMA_MODEL_PATH=fake.

Public API #

Models: ToolCall, ToolDefinition + ToolArgType, ToolResult (incl. pendingInputs, requiresUserInput), ToolErrorCode, Preview + PreviewWarning, UserInputRequest.

Runtime: InAppMcp (facade), ToolPolicy / PolicyDecision / PolicySource / ResolvedPolicy, PolicyStore + InMemoryPolicyStore, GrantStore + InMemoryGrantStore + EphemeralGrant, AuditLedger + InMemoryAuditLedger + AuditEntry, ToolRegistry + RegisteredTool + ToolPreviewer + ToolUndoer, InvocationEngine + InvocationInterceptor.

Error codes: tool_not_found, invalid_arguments, policy_denied, confirmation_required, requires_user_input, audit_disabled, entry_not_found, already_undone, nothing_to_undo, undo_not_supported.

Testing #

# package
flutter analyze && flutter test

# example app
cd example && flutter analyze && flutter test

Security notes #

  • Don't hardcode API keys. Treat tool handlers as side-effect boundaries.
  • Keep risky tools behind confirm or deny.
  • OS-level permissions still apply where the platform requires them.

Docs #

3
likes
160
points
266
downloads

Documentation

API reference

Publisher

unverified uploader

Weekly Downloads

Policy-gated tool runtime for in-app LLM/agent tools. Per-tool auto/confirm/deny, typed contracts, structured results.

Repository (GitHub)
View/report issues

License

MIT (license)

Dependencies

flutter

More

Packages that depend on in_app_mcp