EvalSuite class

Anthropic: a collection of tasks measuring specific capabilities or behaviors. Tasks in a suite typically share a broad goal.

Constructors

EvalSuite({required String name, required String agentName, required SuiteKind kind, required List<EvalTask> tasks, bool requireReferenceSolution = false, double taskPassThreshold = 1.0})
const

Properties

agentName String
The agent these tasks are aimed at, e.g. card_agent / pkm_agent. Drives routing to the right AgentHarnessFactory and is the natural unit for filtering across multi-suite runs.
final
hashCode int
The hash code for this object.
no setterinherited
kind SuiteKind
final
name String
final
requireReferenceSolution bool
If true, every task must declare a referenceSolution. Strongly recommended for capability suites.
final
runtimeType Type
A representation of the runtime type of the object.
no setterinherited
taskPassThreshold double
If a task's mean score across its non-null graders meets or exceeds this threshold, the task is considered "passed" for this suite. The default is 1.0 (binary). Lower values let suites accept partial credit when grading multi-component tasks.
final
tasks List<EvalTask>
final

Methods

noSuchMethod(Invocation invocation) → dynamic
Invoked when a nonexistent method or property is accessed.
inherited
toString() String
A string representation of this object.
inherited
validate() List<String>
Validates the suite at construction time: ids unique, reference solutions present if required. Returns the list of problems (empty if valid).

Operators

operator ==(Object other) bool
The equality operator.
inherited