benchmark_test | Dart package

A tool to integrate benchmarking into your development and testing workflow.

New to benchmark_test? Read the walkthrough article From "I think this is slow" to "I know why": a practical Dart benchmark workflow for the full loop — IDE setup, baselines, profiling, compile targets, and CI — with screenshots and examples. This README is the reference; the article is the guided tour.

Features #

Benchmarks that look and run like package:test tests
Dedicated CLI runner (assert-free by default, multiple compile targets)
Local baselines with percentage comparison
CPU profiling with DevTools export and optional postprocessing
GitHub Action for trend charts and regression alerts

Quick start #

Add benchmark_test as a dev dependency:

dev_dependencies:
  benchmark_test: ^0.1.1

Create a benchmark test file and use benchmark() like a test:

import 'package:benchmark_test/benchmark_test.dart';
import 'package:test/test.dart';

void main() {
  group('my benchmarks', () {
    benchmark('parse json', () {
      // code to benchmark
    });

    benchmark(
      'parse json (long run)',
      () {
        // code to benchmark
      },
      minDuration: Duration(seconds: 4),
      minSamples: 30,
    );
  });
}

Run with the package CLI (recommended for stable timings):

dart run benchmark_test test/benchmarks_test.dart

dart test also works and prints benchmark output, but runs with asserts enabled. See the article for why that matters and how to wire VS Code code lenses.

CLI #

dart run benchmark_test [options] <test-files...> [-- dart-test-args...]

Common options:

Option	Description
`--compile`, `-c`	Compile type(s): `jit`, `aot`, `js`, `wasm`, or comma-separated (default: `jit`)
`--update-baseline`	Write results to `build/benchmark_test/baselines.json`
`--profile`	Capture CPU profiles (JIT only)
`--output`	`human` (default), `benchmarkjs`, or `jsonl`
`--name`	Filter benchmarks by regex
`--plain-name`	Filter benchmarks by plain name
`--enable-asserts`	Run with Dart asserts enabled
`--run-skipped`	Run skipped tests/benchmarks

Examples:

dart run benchmark_test test/benchmarks_test.dart
dart run benchmark_test --compile jit,aot test/benchmarks_test.dart
dart run benchmark_test --update-baseline test/benchmarks_test.dart
dart run benchmark_test --profile --plain-name "parse json" test/benchmarks_test.dart
dart run benchmark_test --output jsonl test/benchmarks_test.dart

Run dart run benchmark_test --help for the full option list.

Compile types #

Type	Runs as	Notes
`jit`	Dart VM (kernel)	Default; required for `--profile`
`aot`	Native executable (`dart compile exe`)	Production-like VM/server timing
`js`	JavaScript	Web targets
`wasm`	WebAssembly	Web targets

Baselines are stored per compile type (for example jit::my benchmarks parse json).

`benchmark()` API #

benchmark() registers a test that repeatedly executes the given function and prints statistics:

Benchmark: my benchmarks parse json
  12345.67 ops/sec
  ±2.34% margin of error
  42 runs sampled
  0:00:00.000081 average duration

Field	Meaning
ops/sec	Estimated operations per second
±%	Relative margin of error (95% confidence interval)
runs sampled	Measured iterations (after warm-up)
average duration	Mean time per iteration

Parameters #

Parameter	Default	Description
`minDuration`	`Duration(seconds: 2)`	Keep sampling until at least this much measured time has elapsed
`minSamples`	`5`	Keep sampling until at least this many measured iterations have completed
`warmupMinSamples`	`1`	Warm-up iterations before sampling
`warmupMinDuration`	`Duration.zero`	Minimum warm-up duration
`targetRme`	`null`	Stop when relative margin of error is at most this value (after minimums)
`maxSamples`	`null`	Upper cap on measured iterations (use with `targetRme`)
`timeout`	`minDuration * 2`	Fail if sampling exceeds this duration

Warm-up iterations are excluded from reported statistics.

`setUpEach` and `tearDownEach` #

Run before and after every measured iteration (not timed). Standard setUp, tearDown, setUpAll, and tearDownAll from package:test also apply.

group('with setup', () {
  setUpEach(() {
    // runs before each measured iteration
  });

  tearDownEach(() {
    // runs after each measured iteration
  });

  benchmark('my benchmark', () {
    // ...
  });
});

In nested groups, setUpEach / tearDownEach apply only to benchmarks in that group.

Baselines #

Human output compares against build/benchmark_test/baselines.json:

dart run benchmark_test --update-baseline test/benchmarks_test.dart
dart run benchmark_test test/benchmarks_test.dart

Higher ops/sec is an improvement. Changes of at least 5% are marked with ✅ (improvement) or ⚠️ (regression). Smaller changes show as plain text with (within ±5% threshold).

The file lives under build/ (gitignored by default). See the article for a worked optimization example.

Output formats #

The CLI supports --output:

Format	Use
`human`	Local development (default)
`benchmarkjs`	github-action-benchmark compatible
`jsonl`	One JSON object per result (`ndjson` alias accepted)

JSONL schema:

{"formatVersion":1,"name":"my benchmarks parse json","compiler":"jit","throughput":{"value":12345.67,"unit":"ops/sec"},"statistics":{"relativeMarginOfError":2.34,"samples":42},"latency":{"mean":81,"unit":"microseconds"}}

VS Code #

The default Run code lens uses dart test (asserts on). Add custom code lenses that invoke the benchmark_test CLI instead. Use "for": ["run-test"] only, not debug-test.

Restrict lenses to benchmark files with codeLens.path — filename globs must start with **/ (for example "**/*_benchmark_test.dart").

{
  "configurations": [
    {
      "name": "Run benchmark",
      "request": "launch",
      "type": "dart",
      "codeLens": {
        "for": ["run-test"],
        "path": "**/*_benchmark_test.dart"
      },
      "customTool": "dart",
      "customToolReplacesArgs": 5,
      "toolArgs": ["run", "benchmark_test"]
    },
    {
      "name": "Update baseline",
      "request": "launch",
      "type": "dart",
      "codeLens": { "for": ["run-test"] },
      "customTool": "dart",
      "customToolReplacesArgs": 5,
      "toolArgs": ["run", "benchmark_test", "--update-baseline"]
    },
    {
      "name": "Profile benchmark",
      "request": "launch",
      "type": "dart",
      "codeLens": { "for": ["run-test"] },
      "customTool": "dart",
      "customToolReplacesArgs": 5,
      "toolArgs": ["run", "benchmark_test", "--profile"]
    }
  ]
}

customToolReplacesArgs: 5 removes the default dart test arguments so toolArgs can run dart run benchmark_test. The article shows what these lenses look like in the editor.

Profiling #

JIT only:

dart run benchmark_test --profile --plain-name "parse json" test/benchmarks_test.dart

Writes under build/benchmark_test/profiles/ per benchmark:

File	Description
`*.cpu.json`	Raw VM `CpuSamples`, filtered to measured benchmark-body iterations
`*.devtools.json`	DevTools snapshot (import or drag into CPU Profiler)
`*.postprocessed.devtools.json`	Same format, postprocessed by the package — async wrappers collapsed, setup/warm-up stripped, benchmark body promoted

The postprocessed file is an extra step the package adds; a normal DevTools export looks like the unprocessed snapshot. The article compares both with flame chart screenshots.

GitHub Action #

Add .github/workflows/benchmark.yaml:

name: Benchmark
on:
  push:
    branches: [master]

permissions:
  contents: write
  deployments: write

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: appsup-dart/benchmark_test@action-v1
        with:
          paths: test/benchmarks_test.dart
          compile: jit,aot
          github-token: ${{ secrets.GITHUB_TOKEN }}
          comment-on-alert: true
          fail-on-alert: true

The @action-v1 ref selects the action wrapper; the CLI version comes from your project's benchmark_test dev dependency.

Key inputs: paths, compile (action default jit,aot), github-token, fail-on-alert, comment-on-alert. Also working-directory, sdk / flutter-channel (Flutter), dart-test-args, benchmark-data-dir-path (default dev/bench), gh-pages-branch, alert-threshold, auto-push. See action.yml.

Results are published to GitHub Pages with one chart per benchmark; each compile type is a separate series (for example parse json [jit]). Live dashboard example: appsup-dart.github.io/firebase_dart/dev/bench/.

For Flutter packages:

- uses: appsup-dart/benchmark_test@action-v1
  with:
    sdk: flutter
    flutter-channel: stable
    paths: test/benchmarks_test.dart
    compile: jit,aot
    github-token: ${{ secrets.GITHUB_TOKEN }}

Benchmark files must be runnable on the Dart VM. See the article for setup steps and what the dashboard tells you.

If your team depends on this package in production, please consider sponsoring maintenance.

Sponsorship helps fund:

compatibility and dependency updates
bug fixes and issue triage
documentation and migration support

👉 https://github.com/sponsors/rbellens

benchmark_test 0.1.2
benchmark_test: ^0.1.2 copied to clipboard

Metadata

Features #

Quick start #

CLI #

Compile types #

`benchmark()` API #

Parameters #

`setUpEach` and `tearDownEach` #

Baselines #

Output formats #

VS Code #

Profiling #

GitHub Action #

← Metadata

Documentation

Publisher

Weekly Downloads

Metadata

Funding

License

Dependencies

More

benchmark_test 0.1.2 benchmark_test: ^0.1.2 copied to clipboard

Metadata

Features #

Quick start #

CLI #

Compile types #

benchmark() API #

Parameters #

setUpEach and tearDownEach #

Baselines #

Output formats #

VS Code #

Profiling #

GitHub Action #

Sponsor #

← Metadata

Documentation

Publisher

Weekly Downloads

Metadata

Funding

License

Dependencies

More

benchmark_test 0.1.2
benchmark_test: ^0.1.2 copied to clipboard

`benchmark()` API #

`setUpEach` and `tearDownEach` #