runTask method
Convenience: run a single task. Useful for ad-hoc debugging or for rerunning a flaky task with extra trials.
Internally wraps the task in a one-off EvalSuite of kind SuiteKind.mixed. The synthetic suite is not persisted to the report store (the run is for diagnosis, not for cross-run analysis).
Implementation
Future<List<TrialResult>> runTask({
required String runName,
required EvalTask task,
required String agentName,
int? trialsOverride,
}) async {
final tempSuite = EvalSuite(
name: '_one_off/${task.id}',
agentName: agentName,
kind: SuiteKind.mixed,
tasks: [task],
);
// Use a one-off runner that skips the persistent report store but
// keeps every other behavior (exporters, recording, rate gate).
final scopedRunner = EvalRunner(
environment: environment,
harnessFactory: harnessFactory,
exporters: exporter is CompositeTraceExporter
? (exporter as CompositeTraceExporter).exporters
: [exporter],
recordingStore: recordingStore,
// Intentionally null: this is a debug run, don't pollute history.
// reportStore: null,
rateLimitGate: rateLimitGate,
defaultTimeout: defaultTimeout,
);
final report = await scopedRunner.runSuite(
runName: runName,
suite: tempSuite,
concurrency: 1,
trialsOverride: trialsOverride,
);
return report.trials;
}