JudgeCalibrator class

度量 LLM judge 与人工评分的一致性。

用法:

final calibrator = JudgeCalibrator();
final report = await calibrator.calibrate(
  goldenSet: humanLabeledTrials,
  judgeScorer: (labeled) async {
    final r = await myLLMJudge.grade(labeled.input, labeled.output);
    return JudgeScore(value: r.score, rationale: r.reasoning);
  },
);
if (!report.meetsAnthropicBar) {
  throw StateError('judge correlation too low: ${report.spearmanCorrelation}');
}
Available extensions

Constructors

JudgeCalibrator({CalibrationConfig config = const CalibrationConfig()})
const

Properties

config CalibrationConfig
final
hashCode int
The hash code for this object.
no setterinherited
runtimeType Type
A representation of the runtime type of the object.
no setterinherited

Methods

calibrate({required List<HumanLabeledTrial> goldenSet, required JudgeScorer judgeScorer, int concurrency = 4}) Future<CalibrationReport>

Available on JudgeCalibrator, provided by the JudgeCalibratorOps extension

noSuchMethod(Invocation invocation) → dynamic
Invoked when a nonexistent method or property is accessed.
inherited
toString() String
A string representation of this object.
inherited

Operators

operator ==(Object other) bool
The equality operator.
inherited