evaluate method
Performs on-demand evaluation of agent traces using a specified evaluator. This synchronous API accepts traces in OpenTelemetry format and returns immediate scoring results with detailed explanations.
May throw AccessDeniedException.
May throw ConflictException.
May throw DuplicateIdException.
May throw InternalServerException.
May throw ResourceNotFoundException.
May throw ServiceQuotaExceededException.
May throw ThrottlingException.
May throw UnauthorizedException.
May throw ValidationException.
Parameter evaluationInput :
The input data containing agent session spans to be evaluated. Includes a
list of spans in OpenTelemetry format from supported frameworks like
Strands (AgentCore Runtime) or LangGraph with OpenInference
instrumentation.
Parameter evaluatorId :
The unique identifier of the evaluator to use for scoring. Can be a
built-in evaluator (e.g., Builtin.Helpfulness,
Builtin.Correctness) or a custom evaluator Id created through
the control plane API.
Parameter evaluationReferenceInputs :
Ground truth data to compare against agent responses during evaluation.
Allows to provide expected responses, assertions, and expected tool
trajectories at different evaluation levels. Session-level reference
inputs apply to the entire conversation, while trace-level reference
inputs target specific request-response interactions identified by trace
ID.
Parameter evaluationTarget :
The specific trace or span IDs to evaluate within the provided input.
Allows targeting evaluation at different levels: individual tool calls,
single request-response interactions (traces), or entire conversation
sessions.
Implementation
Future<EvaluateResponse> evaluate({
required EvaluationInput evaluationInput,
required String evaluatorId,
List<EvaluationReferenceInput>? evaluationReferenceInputs,
EvaluationTarget? evaluationTarget,
}) async {
final $payload = <String, dynamic>{
'evaluationInput': evaluationInput,
if (evaluationReferenceInputs != null)
'evaluationReferenceInputs': evaluationReferenceInputs,
if (evaluationTarget != null) 'evaluationTarget': evaluationTarget,
};
final response = await _protocol.send(
payload: $payload,
method: 'POST',
requestUri: '/evaluations/evaluate/${Uri.encodeComponent(evaluatorId)}',
exceptionFnMap: _exceptionFns,
);
return EvaluateResponse.fromJson(response);
}