EvalSuite class - eval library

EvalSuite class

Anthropic: a collection of tasks measuring specific capabilities or behaviors. Tasks in a suite typically share a broad goal.

Constructors

EvalSuite({required String name, required String agentName, required SuiteKind kind, required List<EvalTask> tasks, bool requireReferenceSolution = false, double taskPassThreshold = 1.0}): const

agentName → String: The agent these tasks are aimed at, e.g. card_agent / pkm_agent. Drives routing to the right AgentHarnessFactory and is the natural unit for filtering across multi-suite runs.
final
hashCode → int: The hash code for this object.
no setterinherited
kind → SuiteKind: final
name → String: final
requireReferenceSolution → bool: If true, every task must declare a referenceSolution. Strongly recommended for capability suites.
final
runtimeType → Type: A representation of the runtime type of the object.
no setterinherited
taskPassThreshold → double: If a task's mean score across its non-null graders meets or exceeds this threshold, the task is considered "passed" for this suite. The default is 1.0 (binary). Lower values let suites accept partial credit when grading multi-component tasks.
final
tasks → List<EvalTask>: final

noSuchMethod(Invocation invocation) → dynamic: Invoked when a nonexistent method or property is accessed.
inherited
toString() → String: A string representation of this object.
inherited
validate() → List<String>: Validates the suite at construction time: ids unique, reference solutions present if required. Returns the list of problems (empty if valid).