evaluate method

Performs on-demand evaluation of agent traces using a specified evaluator. This synchronous API accepts traces in OpenTelemetry format and returns immediate scoring results with detailed explanations.

May throw AccessDeniedException. May throw ConflictException. May throw DuplicateIdException. May throw InternalServerException. May throw ResourceNotFoundException. May throw ServiceQuotaExceededException. May throw ThrottlingException. May throw UnauthorizedException. May throw ValidationException.

Parameter evaluationInput : The input data containing agent session spans to be evaluated. Includes a list of spans in OpenTelemetry format from supported frameworks like Strands (AgentCore Runtime) or LangGraph with OpenInference instrumentation.

Parameter evaluatorId : The unique identifier of the evaluator to use for scoring. Can be a built-in evaluator (e.g., Builtin.Helpfulness, Builtin.Correctness) or a custom evaluator Id created through the control plane API.

Parameter evaluationReferenceInputs : Ground truth data to compare against agent responses during evaluation. Allows to provide expected responses, assertions, and expected tool trajectories at different evaluation levels. Session-level reference inputs apply to the entire conversation, while trace-level reference inputs target specific request-response interactions identified by trace ID.

Parameter evaluationTarget : The specific trace or span IDs to evaluate within the provided input. Allows targeting evaluation at different levels: individual tool calls, single request-response interactions (traces), or entire conversation sessions.

Implementation

BedrockAgentCore class