EvalTask class abstract

Anthropic: a task is a single test with defined inputs and success criteria. Implementations are pure data — they do not run the agent.

Note on what's intentionally not here:

  • There is no successCriteria map. The success contract lives entirely inside graders; a description-only mirror of it on the task would drift from the actual graders. If you want a human-readable summary of what a task tests, use description plus metadata.
  • There is no expectedBehavior enum (positive / negative). Use 'failure_bucket' or a similar tag to mark positive vs negative tasks if you need to filter by it.
Implementers

Constructors

EvalTask()

Properties

description String
One-line human description.
no setter
graders List<Grader>
Graders attached to this task. At least one is required.
no setter
hashCode int
The hash code for this object.
no setterinherited
id String
Stable id. Immutable after creation.
no setter
input Map<String, dynamic>
Input handed to the agent harness. Schema is application-defined.
no setter
metadata Map<String, String>
Free-form labels for filtering and bucketing. Conventional keys: failure_bucket, fixture, difficulty, language, expected.
no setter
referenceSolution ReferenceSolution?
Anthropic Step 2: a known working solution that passes all graders. Strongly recommended — proves the task is solvable and graders are configured correctly. May be required by the parent suite.
no setter
runtimeType Type
A representation of the runtime type of the object.
no setterinherited
timeout Duration?
Optional per-task timeout. Falls back to the runner default if null.
no setter
trialsPerRun int
Anthropic non-determinism: how many trials to run per dataset run. Defaults to 1. Set ≥3 for tasks where stability matters (pass^k).
no setter

Methods

noSuchMethod(Invocation invocation) → dynamic
Invoked when a nonexistent method or property is accessed.
inherited
toString() String
A string representation of this object.
inherited

Operators

operator ==(Object other) bool
The equality operator.
inherited