EvalSuite class
Anthropic: a collection of tasks measuring specific capabilities or behaviors. Tasks in a suite typically share a broad goal.
Constructors
Properties
- agentName → String
-
The agent these tasks are aimed at, e.g.
card_agent/pkm_agent. Drives routing to the right AgentHarnessFactory and is the natural unit for filtering across multi-suite runs.final - hashCode → int
-
The hash code for this object.
no setterinherited
- kind → SuiteKind
-
final
- name → String
-
final
- requireReferenceSolution → bool
-
If true, every task must declare a
referenceSolution. Strongly recommended for capability suites.final - runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
- taskPassThreshold → double
-
If a task's mean score across its non-null graders meets or exceeds
this threshold, the task is considered "passed" for this suite. The
default is 1.0 (binary). Lower values let suites accept partial
credit when grading multi-component tasks.
final
-
tasks
→ List<
EvalTask> -
final
Methods
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
toString(
) → String -
A string representation of this object.
inherited
-
validate(
) → List< String> - Validates the suite at construction time: ids unique, reference solutions present if required. Returns the list of problems (empty if valid).
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited