dart_agent_core 1.0.13
dart_agent_core: ^1.0.13 copied to clipboard
A mobile-first, local-first Dart library for building stateful, tool-using AI agents with multi-provider LLM support (OpenAI, Gemini, AWS Bedrock).
1.0.13 #
- Fix
EventBus: ignore late events afterclose()to prevent errors when events are published during teardown. - Add unit tests for
LoopDetector,Planner, andEventBus.
1.0.12 #
- Add
EvalTranscriptRecorder: automatically captures execution traces (messages, tool calls, reasoning steps, token/turn metrics) fromAgentControllerduring eval runs — harnesses no longer need to manually build transcripts. EvalRunnernow integratesEvalTranscriptRecorderper trial, producing richerTranscriptdata with timing metrics (time-to-first-token, time-to-last-token).- Simplify
AgentHarnessFactoryinterface — harnesses focus on running the agent and returningOutcome, transcript recording is handled by the framework. - Update eval guide docs and examples to reflect the new recorder-based workflow.
1.0.11 #
- Add evaluation subsystem under
package:dart_agent_core/eval.dart(separate entry point — no impact on existingdart_agent_core.dartconsumers).- Core:
EvalTask/EvalSuite/EvalRunner/Trial/Outcome/Transcript/EvalEnvironment/AgentHarnessFactory. - Graders:
CodeGrader(rule / script / classification),ModelGrader(LLM-as-judge with requiredUnknownescape hatch),HumanGrader. - LLM record / replay: hash-based
RecordingLLMClient/ReplayLLMClient,FileRecordingStore, per-trialcacheSaltsotrialsPerRun > 1does not collapse into a single cache entry. - Rate limiting:
RpmRateLimitGate,TpmRateLimitGate,NoopRateLimitGate— decoupled from concurrency, replay hits skip the gate. - Metrics:
pass@k(Codex unbiased estimator),pass^k(empirical),ClassificationMetrics, bucket pass rates, grader means. - Reporting:
FileReportStore,diffRunReports, markdown + JSON output. - Suite health:
SuiteHealthAnalyzer— graduation candidates and broken-task detection across runs. - Calibration:
JudgeCalibrator— Spearman / Pearson / MAE against a human-labeled golden set. - Loaders: code-defined suites and JSON file-tree-defined suites via
loadEvalSuiteFromDir+GraderRegistry. - Observability:
JsonlTraceExporter,CompositeTraceExporter, and built-inLangfuseTraceExporter(background batched ingestion to Langfuse cloud or self-hosted, exponential-backoff retries, schema aligned with langfuse v4).
- Core:
- Add
bin/transcripts.dartCLI:list/show/diff/exportfor persisted run reports, runnable viadart run dart_agent_core:transcripts. - Add three runnable end-to-end demos under
example/eval_demo/:calculator/andcard_agent/(code-defined suites),pkm_agent/(file-defined suite with fixtures); plus a 130-line self-containedexample/min_eval/main.dart. - Add
doc/eval-guide.md(English) anddoc/eval-guide.zh-CN.mdcovering core concepts, components, two suite definition styles, three grader styles, metrics, record/replay, Langfuse export, cross-run health, judge calibration, CLI, and an API cheat sheet. Linked from each README.
1.0.10 #
- Update README and documentation.
1.0.9 #
- BREAKING: Replace
SystemCallbackreturn type from Dart Record(SystemMessage?, List<Tool>, List<LLMMessage>)toSystemCallbackResultclass for broader SDK compatibility and clearer semantics. Callers usingsystemCallbackmust update to access.systemMessage,.tools,.requestMessagesproperties instead of positional destructuring.
1.0.8 #
- Lower Dart SDK minimum constraint from
^3.10.4to^3.9.2to support Flutter 3.35 and HarmonyOS ecosystem. - Add
ToolParameterMode.objectfor tools: receive all arguments as a singleMap<String, dynamic>instead of positional/named parameter mapping viaFunction.apply. - Update tool documentation in README, README.zh-CN, and
doc/tools_and_planning.mdwith object mode usage and examples.
1.0.7 #
- Add
baseUrl/regionto request log messages across all LLM clients for easier debugging. - Add comprehensive provider documentation for Chinese README (Kimi, Qwen, GLM, Doubao, MiniMax, Ollama, OpenRouter, Claude direct).
- Add OpenAI-compatible and Anthropic-compatible provider sections to
doc/providers.md. - Document thinking/reasoning model support (
reasoning_contenthandling). - Update examples list in both READMEs with all provider examples.
- Fix typo "Sendings" → "Sending" in
OpenAIClientstreaming log. - Fix broken
docs/links todoc/in Chinese README.
1.0.6 #
- Fix
OpenAIClientnot handlingreasoning_contentfor thinking/reasoning models (e.g.kimi-k2-thinking,o1,deepseek-r1). - Parse
reasoning_contentfrom both non-streaming and streaming responses intoModelMessage.thought. - Re-send
reasoning_contentin assistant messages during multi-turn conversations to satisfy API validation. - Add
simple_agent_with_kimi_vision_example.dartfor image analysis with Kimi.
1.0.5 #
- Add examples for MiniMax, Kimi, Volcengine Seed, Zhipu GLM, and Qwen via OpenAI-compatible API.
- Fix
OpenAIResponseTransformernot extractingfinish_reasonwhen provider sends it in the same chunk asusage(e.g. GLM). - Fix double JSON encoding of
FunctionCall.argumentsinOpenAIClientandResponsesClientrequest body.
1.0.4 #
- Add
DirectorySkillsupport: load skills fromSKILL.mdfiles in a directory tree with automatic discovery and system prompt injection. - Add
JavaScriptRuntimeandNodeJavaScriptRuntimefor executing JavaScript scripts with bidirectional Dart↔JS bridge communication. - Integrate directory skills and JavaScript execution into
StatefulAgent. - Add
simple_agent_with_directory_skills_example.dartexample. - Update README documentation.
1.0.3 #
- Add
maxTurnsprotection toStatefulAgentto prevent potential infinite loops. - Add internal retry limit for empty model responses/stop reasons in
runStream.
1.0.2 #
- Add standard entry-point
example/main.dartto fix pub.dev example discovery. - Add comprehensive API documentation comments (
///) to core library members. - Fix library-level documentation in
lib/dart_agent_core.dart.
1.0.1 #
- Add
ClaudeClientfor direct Anthropic Messages API support (no AWS Bedrock required). - Add examples for Ollama and OpenRouter usage via
OpenAIClient. - Add Claude example with
ClaudeClient. - Rename
docs/todoc/andexamples/toexample/to follow pub.dev conventions.
1.0.0 #
- Initial release.
- Multi-provider LLM support: OpenAI (Chat Completions & Responses API), Google Gemini, AWS Bedrock (Claude).
- StatefulAgent with autonomous tool-calling loop.
- Multimodal message support (text, image, audio, video, document).
- Streaming via
runStream()with fine-grainedStreamingEvents. - Dynamic Skill system with runtime activation/deactivation.
- Sub-agent delegation with
cloneand named sub-agents. - Planning via
write_todostool withPlanMode. - Context compression with
LLMBasedContextCompressorand episodic memory. - Loop detection (tool signature tracking + LLM-based diagnosis).
- AgentController with Pub/Sub and Request/Response lifecycle hooks.
systemCallbackfor per-call request modification.- FileStateStorage for JSON-based state persistence.