EvalRunReport class

Aggregated outcome of one EvalRunner.runSuite invocation.

Available extensions

Constructors

EvalRunReport({required String runName, required EvalSuite suite, required List<TrialResult> trials, required DateTime startedAt, required DateTime endedAt})

Properties

duration Duration
no setter
endedAt DateTime
final
graderMeans Map<String, double>
Mean of each grader's score across all trials (null-valued scores are excluded). Useful for tracking title_quality, etc.
no setter
hashCode int
The hash code for this object.
no setterinherited
runName String
final
runtimeType Type
A representation of the runtime type of the object.
no setterinherited
startedAt DateTime
final
suite EvalSuite
final
taskPassRate double
Overall task pass rate. Definition depends on suite kind:
no setter
trialPassRate double
Overall fraction of trials that passed.
no setter
trials List<TrialResult>
final

Methods

bucketPassRates(Map<String, String> taskBucketMap) Map<String, double>
Per-bucket pass rate using metadata'failure_bucket' on the source task. The Runner attaches the bucket to each TrialResult via Trial's taskId lookup.
diffWith(EvalRunReport baseline, {double significanceThreshold = 0.05}) EvalRunDiff

Available on EvalRunReport, provided by the EvalRunReportDiff extension

noSuchMethod(Invocation invocation) → dynamic
Invoked when a nonexistent method or property is accessed.
inherited
passAtKByTask({List<int> ks = const [1]}) Map<String, Map<int, double>>
pass@k for each task, computed from its actual trial count. Returns map of taskId -> {k -> pass@k}.
passCaretKByTask({List<int> ks = const [1]}) Map<String, Map<int, double>>
pass^k for each task.
saturationStatus({SaturationThresholds thresholds = const SaturationThresholds()}) SaturationStatus
Saturation snapshot for THIS run. See Anthropic Step 7. Capability suites that come back with a high saturatedTaskRatio are signaling that easy tasks should graduate to a regression suite and harder tasks should be added.
toMarkdownSummary({Map<String, String>? taskBucketMap, List<int> ksToReport = const [1, 3]}) String

Available on EvalRunReport, provided by the EvalRunReportReporting extension

Render this run as a Markdown summary (PR comments, console output).
toString() String
A string representation of this object.
inherited
trialsByTask() Map<String, List<TrialResult>>
Trials grouped by task id.

Operators

operator ==(Object other) bool
The equality operator.
inherited