EvalRunReport class
Aggregated outcome of one EvalRunner.runSuite invocation.
- Available extensions
Constructors
-
EvalRunReport({required String runName, required EvalSuite suite, required List<
TrialResult> trials, required DateTime startedAt, required DateTime endedAt})
Properties
- duration → Duration
-
no setter
- endedAt → DateTime
-
final
-
graderMeans
→ Map<
String, double> -
Mean of each grader's score across all trials (null-valued scores
are excluded). Useful for tracking title_quality, etc.
no setter
- hashCode → int
-
The hash code for this object.
no setterinherited
- runName → String
-
final
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
- startedAt → DateTime
-
final
- suite → EvalSuite
-
final
- taskPassRate → double
-
Overall task pass rate. Definition depends on suite kind:
no setter
- trialPassRate → double
-
Overall fraction of trials that passed.
no setter
-
trials
→ List<
TrialResult> -
final
Methods
-
bucketPassRates(
Map< String, String> taskBucketMap) → Map<String, double> -
Per-bucket pass rate using metadata
'failure_bucket'on the source task. The Runner attaches the bucket to each TrialResult viaTrial'staskIdlookup. -
diffWith(
EvalRunReport baseline, {double significanceThreshold = 0.05}) → EvalRunDiff -
Available on EvalRunReport, provided by the EvalRunReportDiff extension
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
passAtKByTask(
{List< int> ks = const [1]}) → Map<String, Map< int, double> > - pass@k for each task, computed from its actual trial count. Returns map of taskId -> {k -> pass@k}.
-
passCaretKByTask(
{List< int> ks = const [1]}) → Map<String, Map< int, double> > - pass^k for each task.
-
saturationStatus(
{SaturationThresholds thresholds = const SaturationThresholds()}) → SaturationStatus -
Saturation snapshot for THIS run. See Anthropic Step 7. Capability
suites that come back with a high
saturatedTaskRatioare signaling that easy tasks should graduate to a regression suite and harder tasks should be added. -
toMarkdownSummary(
{Map< String, String> ? taskBucketMap, List<int> ksToReport = const [1, 3]}) → String -
Available on EvalRunReport, provided by the EvalRunReportReporting extension
Render this run as a Markdown summary (PR comments, console output). -
toString(
) → String -
A string representation of this object.
inherited
-
trialsByTask(
) → Map< String, List< TrialResult> > - Trials grouped by task id.
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited