benchmark_harness_plus library

A statistically rigorous benchmarking harness for Dart.

This package provides reliable performance measurements using statistical best practices: median-based comparisons, coefficient of variation for reliability assessment, proper warmup phases, and outlier-resistant analysis.

Quick Start

import 'package:benchmark_harness_plus/benchmark_harness_plus.dart';

void main() {
  final benchmark = Benchmark(
    title: 'String Operations',
    variants: [
      BenchmarkVariant(name: 'concat', run: () => 'a' + 'b' + 'c'),
      BenchmarkVariant(name: 'interpolation', run: () => '${'a'}${'b'}${'c'}'),
    ],
  );

  final results = benchmark.run(log: print);
  printResults(results);
}

Why Use This Package?

Traditional benchmarking often uses mean (average) for comparisons, which is sensitive to outliers from GC pauses, OS scheduling, and CPU throttling. This package uses median as the primary metric, providing stable measurements even with occasional outliers.

The coefficient of variation (CV%) tells you how reliable your measurements are:

CV < 10%: Highly reliable
CV 10-20%: Acceptable
CV 20-50%: Directional only
CV > 50%: Unreliable (measurement is noise)

Configuration

Use predefined configurations or create custom ones:

// Quick feedback during development
Benchmark(..., config: BenchmarkConfig.quick);

// Standard benchmarking (default)
Benchmark(..., config: BenchmarkConfig.standard);

// Important performance decisions
Benchmark(..., config: BenchmarkConfig.thorough);

// Custom configuration
Benchmark(..., config: BenchmarkConfig(
  iterations: 5000,
  samples: 15,
  warmupIterations: 1000,
));

Interpreting Results

Look at CV% first - if > 20%, treat comparisons as directional only
Compare medians - this is your primary metric
Check mean vs median - large difference indicates outliers
Look at the ratio - 1.42x means 42% faster than baseline

Best Practices

Use at least 10 samples (20 for important decisions)
Each sample should take at least 10ms (adjust iterations accordingly)
Always warm up before measuring
Report CV% alongside results
Re-run when results seem surprising

Classes

Benchmark: Runs benchmarks and collects statistically rigorous results.
BenchmarkComparison: Comparison between two benchmark results.
BenchmarkConfig: Configuration for benchmark runs.
BenchmarkResult: Results from benchmarking a single variant.
BenchmarkVariant: A benchmark variant to measure.

Enums

ReliabilityLevel: Describes the reliability level of a measurement based on its CV%.

Functions

cv(List<double> samples) → double: Calculates the coefficient of variation (CV) as a percentage.
formatComparison(BenchmarkComparison comparison) → String: Formats a comparison between two results.
formatDetailedResult(BenchmarkResult result) → String: Formats a detailed report for a single benchmark result.
formatResults(List<BenchmarkResult> results, {String? baselineName}) → String: Formats benchmark results as a table string.
formatResultsAsCsv(List<BenchmarkResult> results) → String: Formats results as CSV for export or further analysis.
max(List<double> samples) → double: Returns the maximum value in samples.
mean(List<double> samples) → double: Calculates the arithmetic mean (average) of a list of samples.
median(List<double> samples) → double: Calculates the median (middle value) of a list of samples.
min(List<double> samples) → double: Returns the minimum value in samples.
printReliabilityWarning(List<BenchmarkResult> results) → bool: Prints a reliability warning if any result has poor reliability.
printResults(List<BenchmarkResult> results, {String? baselineName}) → void: Prints benchmark results to the console.
reliabilityFromCV(double cvPercent) → ReliabilityLevel: Determines the reliability level based on coefficient of variation.
stdDev(List<double> samples) → double: Calculates the sample standard deviation of a list of samples.

Typedefs

BenchmarkLogger = void Function(String message): Optional callback for benchmark progress reporting.

benchmark_harness_plus library

Quick Start

Why Use This Package?

Configuration

Interpreting Results

Best Practices

Classes

Enums

Functions

Typedefs

benchmark_harness_plus package

benchmark_harness_plus library