eval 0.0.5
eval: ^0.0.5 copied to clipboard
Pure Dart LLM evaluation helpers for tests, including judge-based matchers, RAG scoring, and statistics.
0.0.5 #
- Increase timeout for evals to 10 minutes, expose timeout attribute
0.0.4 #
- Fix
parseMarkdownBodyso nested frontmatter collections are mutable. Lists and maps nested inside the parsed frontmatter previously came back as immutableYamlList/YamlMapinstances and threwUnsupportedErroron assignment. - Export
deepConvertYaml, a recursive helper that convertsYamlMap/YamlListtrees to mutableMap<String, dynamic>andListstructures.
0.0.3 #
- Overhaul the README and public Dartdoc so
eval(...)is documented as the primary workflow instead of rawtest(...). - Document the full exported matcher surface, including the previously omitted JSON array, schema-path, frontmatter schema, and RAG matchers.
- Refresh the bundled example to show an end-to-end
eval(...)run with sync and async assertions. - Reset internal eval run state after each run so
expect(...)cleanly falls back to normal test behavior outside an active eval.
0.0.2 #
- Align package metadata and documentation with the published API.
- Fix async LLM and RAG matcher behavior so sync
expect(...)usage fails with clear guidance instead of silently succeeding. - Fix
APICallQueuerecovery so one failed request does not poison later queued calls. - Preserve detailed
evaluateRag()metadata including relevant context indices, unsupported claims, and joined metric reasons. - Treat empty frontmatter as valid frontmatter and reject malformed YAML with closing delimiters.
- Distinguish missing paths from explicit
nullvalues in schema-based path matchers.
0.0.1 #
- Initial public release of the
evalpackage. - Added string, JSON, schema, frontmatter, distance, LLM-judge, and RAG matchers.
- Added aggregate statistics and prompt comparison helpers.
- Added the
APICallServiceabstraction and the bundled Claude example service.