JudgeCalibrator class
度量 LLM judge 与人工评分的一致性。
用法:
final calibrator = JudgeCalibrator();
final report = await calibrator.calibrate(
goldenSet: humanLabeledTrials,
judgeScorer: (labeled) async {
final r = await myLLMJudge.grade(labeled.input, labeled.output);
return JudgeScore(value: r.score, rationale: r.reasoning);
},
);
if (!report.meetsAnthropicBar) {
throw StateError('judge correlation too low: ${report.spearmanCorrelation}');
}
- Available extensions
Constructors
- JudgeCalibrator({CalibrationConfig config = const CalibrationConfig()})
-
const
Properties
- config → CalibrationConfig
-
final
- hashCode → int
-
The hash code for this object.
no setterinherited
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
Methods
-
calibrate(
{required List< HumanLabeledTrial> goldenSet, required JudgeScorer judgeScorer, int concurrency = 4}) → Future<CalibrationReport> -
Available on JudgeCalibrator, provided by the JudgeCalibratorOps extension
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
toString(
) → String -
A string representation of this object.
inherited
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited