eval 0.0.5 copy "eval: ^0.0.5" to clipboard
eval: ^0.0.5 copied to clipboard

Pure Dart LLM evaluation helpers for tests, including judge-based matchers, RAG scoring, and statistics.

0.0.5 #

  • Increase timeout for evals to 10 minutes, expose timeout attribute

0.0.4 #

  • Fix parseMarkdownBody so nested frontmatter collections are mutable. Lists and maps nested inside the parsed frontmatter previously came back as immutable YamlList/YamlMap instances and threw UnsupportedError on assignment.
  • Export deepConvertYaml, a recursive helper that converts YamlMap/YamlList trees to mutable Map<String, dynamic> and List structures.

0.0.3 #

  • Overhaul the README and public Dartdoc so eval(...) is documented as the primary workflow instead of raw test(...).
  • Document the full exported matcher surface, including the previously omitted JSON array, schema-path, frontmatter schema, and RAG matchers.
  • Refresh the bundled example to show an end-to-end eval(...) run with sync and async assertions.
  • Reset internal eval run state after each run so expect(...) cleanly falls back to normal test behavior outside an active eval.

0.0.2 #

  • Align package metadata and documentation with the published API.
  • Fix async LLM and RAG matcher behavior so sync expect(...) usage fails with clear guidance instead of silently succeeding.
  • Fix APICallQueue recovery so one failed request does not poison later queued calls.
  • Preserve detailed evaluateRag() metadata including relevant context indices, unsupported claims, and joined metric reasons.
  • Treat empty frontmatter as valid frontmatter and reject malformed YAML with closing delimiters.
  • Distinguish missing paths from explicit null values in schema-based path matchers.

0.0.1 #

  • Initial public release of the eval package.
  • Added string, JSON, schema, frontmatter, distance, LLM-judge, and RAG matchers.
  • Added aggregate statistics and prompt comparison helpers.
  • Added the APICallService abstraction and the bundled Claude example service.
2
likes
160
points
217
downloads

Documentation

API reference

Publisher

verified publisherscalabs.de

Weekly Downloads

Pure Dart LLM evaluation helpers for tests, including judge-based matchers, RAG scoring, and statistics.

Homepage
Repository (GitHub)
View/report issues

Topics

#llm #evaluation #testing #ai #rag

License

MIT (license)

Dependencies

http, matcher, test, yaml

More

Packages that depend on eval