buildEvaluationInstructions method

@override

List<ChatMessage>? buildEvaluationInstructions(

List<ChatMessage> messages,
ChatResponse modelResponse,
List<EvaluationContext> additionalContext

)

override

Builds the evaluation instructions (system + user messages).

Return null to signal that a required context was missing.

Implementation

@override
List<ChatMessage>? buildEvaluationInstructions(
  List<ChatMessage> messages,
  ChatResponse modelResponse,
  List<EvaluationContext> additionalContext,
) {
  final ctx = additionalContext.whereType<RetrievalEvaluatorContext>().firstOrNull;
  if (ctx == null) return null;

  final userRequest = messages.lastUserMessage?.text ?? '';
  final chunks = ctx.retrievedContextChunks.asMap().entries
      .map((e) => '[${e.key + 1}] ${e.value}')
      .join('\n');
  final prompt = '''
# Definition
**Retrieval** measures how relevant and well-ranked the retrieved context chunks are for the given QUERY.

# Ratings
## [Retrieval: 1] Chunks are entirely irrelevant to the QUERY.
## [Retrieval: 2] Chunks have very little relevance to the QUERY.
## [Retrieval: 3] Chunks are partially relevant but key information is missing or poorly ranked.
## [Retrieval: 4] Chunks are mostly relevant and reasonably ranked.
## [Retrieval: 5] Chunks are highly relevant and perfectly ranked for the QUERY.

# Data
QUERY: $userRequest
RETRIEVED CONTEXT CHUNKS:
$chunks

# Tasks
## Score the retrieval quality.
- **ThoughtChain**: Think step by step. Start with "Let's think step by step:".
- **Explanation**: A very short explanation of why you think the input Data should get that Score.
- **Score**: An integer score (1–5) based on the definitions.

## Please provide your answers between the tags: <S0>your chain of thoughts</S0>, <S1>your explanation</S1>, <S2>your Score</S2>.
# Output
''';
  return [
    ChatMessage.fromText(ChatRole.system, _systemPrompt),
    ChatMessage.fromText(ChatRole.user, prompt),
  ];
}

buildEvaluationInstructions method

Implementation

RetrievalEvaluator class