benchmark_test 0.1.2
benchmark_test: ^0.1.2 copied to clipboard
A tool to integrate benchmarking into your development and testing workflow.
A tool to integrate benchmarking into your development and testing workflow.
New to benchmark_test? Read the walkthrough article From "I think this is slow" to "I know why": a practical Dart benchmark workflow for the full loop — IDE setup, baselines, profiling, compile targets, and CI — with screenshots and examples. This README is the reference; the article is the guided tour.
Features #
- Benchmarks that look and run like
package:testtests - Dedicated CLI runner (assert-free by default, multiple compile targets)
- Local baselines with percentage comparison
- CPU profiling with DevTools export and optional postprocessing
- GitHub Action for trend charts and regression alerts
Quick start #
Add benchmark_test as a dev dependency:
dev_dependencies:
benchmark_test: ^0.1.1
Create a benchmark test file and use benchmark() like a test:
import 'package:benchmark_test/benchmark_test.dart';
import 'package:test/test.dart';
void main() {
group('my benchmarks', () {
benchmark('parse json', () {
// code to benchmark
});
benchmark(
'parse json (long run)',
() {
// code to benchmark
},
minDuration: Duration(seconds: 4),
minSamples: 30,
);
});
}
Run with the package CLI (recommended for stable timings):
dart run benchmark_test test/benchmarks_test.dart
dart test also works and prints benchmark output, but runs with asserts enabled. See the article for why that matters and how to wire VS Code code lenses.
CLI #
dart run benchmark_test [options] <test-files...> [-- dart-test-args...]
Common options:
| Option | Description |
|---|---|
--compile, -c |
Compile type(s): jit, aot, js, wasm, or comma-separated (default: jit) |
--update-baseline |
Write results to build/benchmark_test/baselines.json |
--profile |
Capture CPU profiles (JIT only) |
--output |
human (default), benchmarkjs, or jsonl |
--name |
Filter benchmarks by regex |
--plain-name |
Filter benchmarks by plain name |
--enable-asserts |
Run with Dart asserts enabled |
--run-skipped |
Run skipped tests/benchmarks |
Examples:
dart run benchmark_test test/benchmarks_test.dart
dart run benchmark_test --compile jit,aot test/benchmarks_test.dart
dart run benchmark_test --update-baseline test/benchmarks_test.dart
dart run benchmark_test --profile --plain-name "parse json" test/benchmarks_test.dart
dart run benchmark_test --output jsonl test/benchmarks_test.dart
Run dart run benchmark_test --help for the full option list.
Compile types #
| Type | Runs as | Notes |
|---|---|---|
jit |
Dart VM (kernel) | Default; required for --profile |
aot |
Native executable (dart compile exe) |
Production-like VM/server timing |
js |
JavaScript | Web targets |
wasm |
WebAssembly | Web targets |
Baselines are stored per compile type (for example jit::my benchmarks parse json).
benchmark() API #
benchmark() registers a test that repeatedly executes the given function and prints statistics:
Benchmark: my benchmarks parse json
12345.67 ops/sec
±2.34% margin of error
42 runs sampled
0:00:00.000081 average duration
| Field | Meaning |
|---|---|
| ops/sec | Estimated operations per second |
| ±% | Relative margin of error (95% confidence interval) |
| runs sampled | Measured iterations (after warm-up) |
| average duration | Mean time per iteration |
Parameters #
| Parameter | Default | Description |
|---|---|---|
minDuration |
Duration(seconds: 2) |
Keep sampling until at least this much measured time has elapsed |
minSamples |
5 |
Keep sampling until at least this many measured iterations have completed |
warmupMinSamples |
1 |
Warm-up iterations before sampling |
warmupMinDuration |
Duration.zero |
Minimum warm-up duration |
targetRme |
null |
Stop when relative margin of error is at most this value (after minimums) |
maxSamples |
null |
Upper cap on measured iterations (use with targetRme) |
timeout |
minDuration * 2 |
Fail if sampling exceeds this duration |
Warm-up iterations are excluded from reported statistics.
setUpEach and tearDownEach #
Run before and after every measured iteration (not timed). Standard setUp, tearDown, setUpAll, and tearDownAll from package:test also apply.
group('with setup', () {
setUpEach(() {
// runs before each measured iteration
});
tearDownEach(() {
// runs after each measured iteration
});
benchmark('my benchmark', () {
// ...
});
});
In nested groups, setUpEach / tearDownEach apply only to benchmarks in that group.
Baselines #
Human output compares against build/benchmark_test/baselines.json:
dart run benchmark_test --update-baseline test/benchmarks_test.dart
dart run benchmark_test test/benchmarks_test.dart
Higher ops/sec is an improvement. Changes of at least 5% are marked with ✅ (improvement) or ⚠️ (regression). Smaller changes show as plain text with (within ±5% threshold).
The file lives under build/ (gitignored by default). See the article for a worked optimization example.
Output formats #
The CLI supports --output:
| Format | Use |
|---|---|
human |
Local development (default) |
benchmarkjs |
github-action-benchmark compatible |
jsonl |
One JSON object per result (ndjson alias accepted) |
JSONL schema:
{"formatVersion":1,"name":"my benchmarks parse json","compiler":"jit","throughput":{"value":12345.67,"unit":"ops/sec"},"statistics":{"relativeMarginOfError":2.34,"samples":42},"latency":{"mean":81,"unit":"microseconds"}}
VS Code #
The default Run code lens uses dart test (asserts on). Add custom code lenses that invoke the benchmark_test CLI instead. Use "for": ["run-test"] only, not debug-test.
Restrict lenses to benchmark files with codeLens.path — filename globs must start with **/ (for example "**/*_benchmark_test.dart").
{
"configurations": [
{
"name": "Run benchmark",
"request": "launch",
"type": "dart",
"codeLens": {
"for": ["run-test"],
"path": "**/*_benchmark_test.dart"
},
"customTool": "dart",
"customToolReplacesArgs": 5,
"toolArgs": ["run", "benchmark_test"]
},
{
"name": "Update baseline",
"request": "launch",
"type": "dart",
"codeLens": { "for": ["run-test"] },
"customTool": "dart",
"customToolReplacesArgs": 5,
"toolArgs": ["run", "benchmark_test", "--update-baseline"]
},
{
"name": "Profile benchmark",
"request": "launch",
"type": "dart",
"codeLens": { "for": ["run-test"] },
"customTool": "dart",
"customToolReplacesArgs": 5,
"toolArgs": ["run", "benchmark_test", "--profile"]
}
]
}
customToolReplacesArgs: 5 removes the default dart test arguments so toolArgs can run dart run benchmark_test. The article shows what these lenses look like in the editor.
Profiling #
JIT only:
dart run benchmark_test --profile --plain-name "parse json" test/benchmarks_test.dart
Writes under build/benchmark_test/profiles/ per benchmark:
| File | Description |
|---|---|
*.cpu.json |
Raw VM CpuSamples, filtered to measured benchmark-body iterations |
*.devtools.json |
DevTools snapshot (import or drag into CPU Profiler) |
*.postprocessed.devtools.json |
Same format, postprocessed by the package — async wrappers collapsed, setup/warm-up stripped, benchmark body promoted |
The postprocessed file is an extra step the package adds; a normal DevTools export looks like the unprocessed snapshot. The article compares both with flame chart screenshots.
GitHub Action #
Add .github/workflows/benchmark.yaml:
name: Benchmark
on:
push:
branches: [master]
permissions:
contents: write
deployments: write
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: appsup-dart/benchmark_test@action-v1
with:
paths: test/benchmarks_test.dart
compile: jit,aot
github-token: ${{ secrets.GITHUB_TOKEN }}
comment-on-alert: true
fail-on-alert: true
The @action-v1 ref selects the action wrapper; the CLI version comes from your project's benchmark_test dev dependency.
Key inputs: paths, compile (action default jit,aot), github-token, fail-on-alert, comment-on-alert. Also working-directory, sdk / flutter-channel (Flutter), dart-test-args, benchmark-data-dir-path (default dev/bench), gh-pages-branch, alert-threshold, auto-push. See action.yml.
Results are published to GitHub Pages with one chart per benchmark; each compile type is a separate series (for example parse json [jit]). Live dashboard example: appsup-dart.github.io/firebase_dart/dev/bench/.
For Flutter packages:
- uses: appsup-dart/benchmark_test@action-v1
with:
sdk: flutter
flutter-channel: stable
paths: test/benchmarks_test.dart
compile: jit,aot
github-token: ${{ secrets.GITHUB_TOKEN }}
Benchmark files must be runnable on the Dart VM. See the article for setup steps and what the dashboard tells you.
Sponsor #
If your team depends on this package in production, please consider sponsoring maintenance.
Sponsorship helps fund:
- compatibility and dependency updates
- bug fixes and issue triage
- documentation and migration support