JudgeCalibrator class - eval library

JudgeCalibrator class

度量 LLM judge 与人工评分的一致性。

用法：

final calibrator = JudgeCalibrator();
final report = await calibrator.calibrate(
  goldenSet: humanLabeledTrials,
  judgeScorer: (labeled) async {
    final r = await myLLMJudge.grade(labeled.input, labeled.output);
    return JudgeScore(value: r.score, rationale: r.reasoning);
  },
);
if (!report.meetsAnthropicBar) {
  throw StateError('judge correlation too low: ${report.spearmanCorrelation}');
}

Available extensions

JudgeCalibratorOps

Constructors

JudgeCalibrator({CalibrationConfig config = const CalibrationConfig()}): const

Properties

config → CalibrationConfig: final
hashCode → int: The hash code for this object.
no setterinherited
runtimeType → Type: A representation of the runtime type of the object.
no setterinherited

Methods

calibrate({required List<HumanLabeledTrial> goldenSet, required JudgeScorer judgeScorer, int concurrency = 4}) → Future<CalibrationReport>: Available on JudgeCalibrator, provided by the JudgeCalibratorOps extension
noSuchMethod(Invocation invocation) → dynamic: Invoked when a nonexistent method or property is accessed.
inherited
toString() → String: A string representation of this object.
inherited

Operators

operator ==(Object other) → bool: The equality operator.
inherited