Anthropic non-determinism: how many trials to run per dataset run. Defaults to 1. Set ≥3 for tasks where stability matters (pass^k).
@override final int trialsPerRun;