JsonEvalTask class

Data-driven EvalTask. Constructed from a parsed JSON map plus a list of graders already resolved by the loader.

Schema of one task file:

{
  "id": "card_event_meeting",
  "description": "Calendar-style input → 'event' template.",
  "input": {
    "prompt": "..."
  },
  "metadata": {
    "failure_bucket": "template_event"
  },
  "trials_per_run": 2,
  "timeout_seconds": 120,
  "reference_solution": {
    "expected_outcome": {"card_saved": true},
    "source": "manual"
  },
  "graders": [
    {"name": "card_saved", "config": {"fact_id": "fact_001",
                                       "templates": ["event"]}},
    {"name": "called_get_card_metadata"}
  ]
}

The agent_name lives in suite.json, not here — every task in a suite shares the same target agent. See EvalSuite.

Implemented types

Constructors

JsonEvalTask({required String id, required String description, required Map<String, dynamic> input, required List<Grader> graders, ReferenceSolution? referenceSolution, Map<String, String> metadata = const {}, int trialsPerRun = 1, Duration? timeout})

Properties

description String
One-line human description.
final
graders List<Grader>
Graders attached to this task. At least one is required.
final
hashCode int
The hash code for this object.
no setterinherited
id String
Stable id. Immutable after creation.
final
input Map<String, dynamic>
Input handed to the agent harness. Schema is application-defined.
final
metadata Map<String, String>
Free-form labels for filtering and bucketing. Conventional keys: failure_bucket, fixture, difficulty, language, expected.
final
referenceSolution ReferenceSolution?
Anthropic Step 2: a known working solution that passes all graders. Strongly recommended — proves the task is solvable and graders are configured correctly. May be required by the parent suite.
final
runtimeType Type
A representation of the runtime type of the object.
no setterinherited
timeout Duration?
Optional per-task timeout. Falls back to the runner default if null.
final
trialsPerRun int
Anthropic non-determinism: how many trials to run per dataset run. Defaults to 1. Set ≥3 for tasks where stability matters (pass^k).
final

Methods

noSuchMethod(Invocation invocation) → dynamic
Invoked when a nonexistent method or property is accessed.
inherited
toString() String
A string representation of this object.
inherited

Operators

operator ==(Object other) bool
The equality operator.
inherited