Mutation testing

Dart codecov

When writing test cases for software, QA engineers often rely on metrics like code coverage to verify that your test cases actually test your program. However, you cannot quantify the quality of your tests with such a simple metric. It is possible to reach high line and branch coverage, while you are only testing a fraction of the observable behavior of your units. The worst case are tests that only call all methods to reach a high line coverage, but do not contain any assertions. Sometimes you forget to add an assertion statement in a test case or you removed some assertions during in your development branch so that the continuous integration build succeeds. Ideally, this should be caught during the code review, but any manual process is error prone.

So how can we evaluate the quality of our software tests if line coverage is not a good metric? What is a "good" test? In short, a good test should fail if there are changes of the observable behavior of the tested procedures. You can evaluate the quality of your tests by modifying a single line of your program and then verify that your tests are sensitive to that change. This process is called Mutation testing. After a certain number of mutations, the fraction of detected to undetected mutations is an indication of the quality of your tests. Performing this procedure manually on a whole program is extremely tedious.

This repository contains a command line program that automates this procedure for code in any programming language. It can be customized to your needs, because all rules modifying the source code and how to run the tests can be defined in XML documents. The program is fully self contained, so you can just grab the binary and start testing!

Quick start

If you are working on a dart project, you can run the binary without any arguments at the root of your project. The application will the assume that "dart test" is the test command and that all files ending with ".dart" in the directory lib/ are input files.

# Adds the package 
dart pub add --dev mutation_test
dart run mutation_test

Running this command may take a long time (hours depending on the size of your code). The output will be written to the directory ./mutation-test-report. The default report format is html. A top-level report will be generated listing all input files:

Top level report

From there, you can follow the links to the reports for the individual input files. These reports show all lines of the source files, and undetected mutations are marked as red lines. You can view the undetected changes by clicking on the respective line:

Report for a source file

The application also supports several command line options:

# Prints a summary of all command line options:
dart run mutation_test --help
# Run the tests defined in "example/config.xml":
dart run mutation_test example/config.xml
# Or a fully customized test run with a rules file and 3 input files:
# The rules contained in mutation-rules.xml are always used when testing files.
# inputset1.xml may define special rules for some files that
# are also listed in the same xml document.
# The input files source1.cpp and source2.cpp
# are just tested with the rules from mutation-rules.xml (--rules).
# The output is written to directory output (-o) and the 
# report is generated as markdown file (-f md).
dart run mutation_test -f md -o output --rules mutation-rules.xml inputset1.xml \
    source1.cpp source2.cpp

The first command in the section above would produce the following report. Check also the examples folder, as it contains the inputs to produce this report. The API documentation generated by dart can be found on pub.

Running an incremental analysis (CI)

Performing a mutation test on your whole code base will take very long, and in most cases it is not needed. Often you will only want to check the difference between two commits, e.g. to review a pull request. This is especially helpful to run the analysis as part of a continuous integration pipeline.

On linux, you can run an incremental analysis on the changes between the current and last commit by using this command:

dart run mutation_test $(echo $(git diff --name-only HEAD HEAD~1 | grep -v "^test" | grep ".dart$" | tr '\n' ' '))

The command lists all changed files, removes paths starting with test and all files not ending with '.dart', and then runs the mutation test on all remaining files.

Similar versions of this command should work on windows or mac os.

Speeding up the analysis

In order to reduce the time needed for a full project analysis, you can provide coverage data in the lcov format when calling the program. The dart sdk supports generating the coverage information with a few commands:

dart pub global activate coverage
dart pub global run coverage:test_with_coverage
dart run mutation_test --coverage coverage/

The algorithm to exclude tests uses a conservative approach: It will only exclude mutants that are marked as instrumented and without any hits in the lcov file. Lines or files that are not present in the coverage database are assumed to be part of the covered statements.

There is also an experimental option to exclude strings without interpolation as mutation candidates:

dart run mutation_test --exclude-strings


  • Fully configurable mutation rules via XML documents and regular expressions
  • Sections of files can be whitelisted on a per file basis
  • Only mutants whose statements are covered will be executed
  • You can add global exclusion rules for e.g. comments, loop conditions via regular expressions
  • Different report formats are supported: html, xunit/junit, markdown and XML

A brief description of the program

mutation_test is a program that mutates your source code and verifies that the test commands specified in the input xml files are sensitive to those changes. Mutations are done as simple text replacements with regular expressions, so any text file can be mutated. Once one of the files has been mutated, all provided test commands are run as a separate process. The exit code of these commands is used to verify that the mutation was detected. If all tests return the expected return value, then the mutation was undetected and is added to the results. After all mutations were done, the results will be written to the terminal and a report file is generated. mutation_test is free software, as in "free beer" and "free speech".

mutation_test contains a set of builtin rules, that allow you to start testing right away. However, all rules defining the behavior of this program can be customized. They are defined in XML documents, and you can change:

  • input files and whitelist lines for mutations
  • compile/test commands, expected return codes and timeouts
  • provide exclusion zones via regular expressions
  • mutation rules as simple text replacement or via regular expressions including capture groups
  • the quality gate and quality ratings You can view a complete example with every possible XML element parsed by this program by invoking "mutation_test -s". This will print a XML document to the standard output. The displayed document also contains comments explaining the syntax of the XML file. You can provide multiple input documents for a single program start. The inputs are split into three categories:
  • xml rules documents: The mutation rules for all other files are parsed from these documents and added globally. Rules are specified via "--rules".
  • xml documents: These files will be parsed like the rules documents, but anything defined in them applies only inside this document.
  • all other input files If a rules file is provided via the command line flag "--rules", then the builtin rules are disabled, unless you specifically add them by passing "-b". You can provide as many rule sets as you like, and all of them will be added globally. The rest of the input files is processed individually. If the file extension is ".xml", then the file will be parsed like an additional rules file. However, this document must have a

The rules documents and the input xml files use the same syntax, so both files may define mutation rules, inputs, exclusions or test commands. However, a quality threshold may only be defined once.


After a input file is processed, a report is generated. You can choose multiple output formats for the reports. As default, a html file is generated, but you can also choose xunit/junit, markdown or XML. You can see examples of the outputs in the example folder.

Input XML documents

This chapter explains the structure of the input XML documents. They must conform to the following schema:

<?xml version="1.0" encoding="UTF-8"?>
<mutations version="1.1">
    <threshold failure="80">

You can see an example for an input document in the example folder, or the application can generate one by running one of these commands:

# Shows a XML document with the complete syntax:
mutation_test -s
# Shows the builtin mutation rules and exclusions:
mutation_test -g

The generated documents also contain some helpful comments on how to create your own rules. You should usually provide two different documents: one with the mutation rules given as argument to "-r" and another one with the input files. The reason why mutation_test always loads two files (unless you disable the builtin rule set via "--no-builtin" and don't provide your own rules file) is that you can reuse the same set of rules for many different input files.


The children of "files" elements are individual files:

      <!-- lines can be whitelisted  -->
      <!-- if there is no whitelist, the whole file is used  -->
      <!-- line index starts at 1  -->
      <lines begin="13" end="24"/>
      <lines begin="29" end="35"/>

The application will perform the mutation tests in sequence on the listed files. All mutations that are not in an exclusion or inside a whitelisted area will be applied.


The children of "directories" elements are directories where files are searched:

    <!-- adds all files in the directory  -->
    <!-- adds files matching one of the patterns.  -->
      <!-- matching tokens need the attribute pattern, which holds a regular expression  -->
      <matching pattern="\.cpp$"/>
      <matching pattern="\.cxx$"/>
      <matching pattern="\.c$"/>

The application will perform the mutation tests on all files found in the directories.


The commands block lets you specify the command line programs to verify that a mutation is detected. The commands are run in document sequence and must be each a single command line call.

<!-- Specify the test commands here with the command element -->
<!-- The text of the command element will be executed as shell process -->
<!-- The return value of the command will used to check for success -->
<!-- If all commands execute successfully, a mutation counts as undetected -->
  <!-- All attributes here are optional -->
  <!-- group: is used to show statistics for the commands -->
  <!-- expected-return: this value is compared to the return value of the
       command. Must be an integer -->
  <!-- working-directory: Where the program is executed. Defaults to . -->
  <!-- timeout: Timeout in seconds. Must be an integer. If not present, 
       the commands will run until they are finished. -->
  <command group="compile" expected-return="0" 
    working-directory=".">make -j8</command>
  <command group="test" expected-return="0" working-directory="."
    timeout="10">ctest -j8</command>


You can create rules to exclude portions of source files or the full file from the mutation testing:

  <!-- excludes anything between two tokens  -->
  <token begin="//" end="\n"/>
  <token begin="#" end="\n"/>
  <!-- excludes anything that matches a pattern  -->
  <regex pattern="/[*].*?[*]/" dotAll="true"/>
  <!-- excludes loops from mutations to prevent tests to run forever -->
  <regex pattern="[\s]for[\s]*\(.*?\)[\s]*{" dotAll="true"/>
  <regex pattern="[\s]while[\s]*\(.*?\)[\s]*{.*?}" dotAll="true"/>
  <!-- lines can also be globally excluded  -->
  <!-- line index starts at 1 -->
  <lines begin="1" end="2">
  <!-- It is possible to exclude files using the file element. -->

Explicit exclusions have precedence over inclusions.


This element is the most important part of the document. It defines what is mutated, and how it is changed.

<!-- The rules element describes all mutations done during a mutation test -->
<!-- The following children are parsed: literal and regex -->
<!-- A literal element matches the literal text -->
<!-- A regex element mutates source code if the regular expression matches -->
<!-- Each of them must have at least one mutation child -->
  <!-- A literal element matches the literal text and replaces it with the 
       list of mutations. This will replace any "+" with "-" or "*".
       The "id" attribute is optional and will be used when creating the reports. -->
  <literal text="+" id="add">
    <mutation text="-"/>
    <mutation text="*"/>
  <!-- It is also possible to match a regular expression with capture groups. -->
  <!-- If the optional attribute dotAll is set to true, 
       then the . will also match newlines.  -->
  <!-- If not present, the default value for dotAll is false.  -->
  <!-- Here, we capture everything inside of the braces of "if ()" -->
  <regex pattern="[\s]if[\s]*\((.*?)\)[\s]*{" dotAll="true" id="if">
    <!-- You can access groups via $1. -->
    <!-- If your string contains a $ followed by a number that should not be
         replaced, escape the dollar \$ -->
    <!-- If your string contains a \$ followed by a number that should not be
         replaced, escape the slash \\$ -->
    <!-- Tabs and newlines should also be escaped. -->
    <mutation text=" if (!($1)) {"/>


The threshold element allows you to configure the limit for a successful analysis and the quality ratings. Below is the built-in configuration:

  <!-- Configures the reporting thresholds as percentage of detected mutations -->
  <!-- Attribute failure is required and must be a floating point number. -->
  <!-- Note: There can only be one threshold element in all input files! -->
  <!-- If no threshold element is found, these values will be used. -->
  <threshold failure="80">
    <!-- Provides reliability rating levels. Attributes are required. -->
    <rating over="100" name="A"/>
    <rating over="80" name="B"/>
    <rating over="60" name="C"/>
    <rating over="40" name="D"/>
    <rating over="20" name="E"/>
    <rating over="0" name="F"/>

When setting a failure limit, remember that some mutations may be impossible to detect (e.g. converting "0" to "-0").

Table of XML elements

Here is a table of all XML elements that are parsed by this program:

Element Children Attributes Description
mutations files, rules, exclude, commands version Top level element
files file Holds the list of files to mutate
directories directory recursive Holds the list of directories to search for files
exclude token, regex, lines Holds the list of exclusions from mutations.
commands command Holds the list of commands to run
rules literal, regex Holds the list of mutation rules
file lines Contains the path the to file as text. If there are lines children present, only the given lines are mutated.
lines begin, end Specifies an interval of lines [begin,end] in the source file.
matching pattern Specifies the pattern for the file names in the directory.
command name, group, expected-return, timeout Contains the command to execute as text. All attributes are optional.
token begin, end A range in the source file delimited by the begin and end tokens.
literal mutation id, text Matches the string in attribute text and replaces it with its children.
regex mutation id, pattern, dotAll A pattern for a regular expression. The expression is always multiline and processes the complete file. You can use "." to match newlines if the optional attribute dotAll is set to true.
mutation text A replacement for a match. If this element is a child of a regex node, then capture groups can be used in the text via $i.
threshold rating failure Configures the limit for a failed analysis and the quality ratings
rating over, name A quality rating. Attribute over is the lowest percentage for this rating.

Command line arguments

mutation_test <options> <input xml files...>

The program accepts the following command line arguments:

Short Long Description
-h --help Displays the help message
--version Prints the version
--about Prints information about the application
-b --(no-)builtin Adds or removes the builtin rule set
-s --show-example Prints a XML file to the console with every possible option
-g --generate-rules Prints the builtin rule set as XML string
-v --verbose Verbose output
-q --quiet Disable output
-d --dry Dry run - loads the configuration and counts the possible mutations in all files, but runs no tests
-o --output= Sets the output directory (defaults to ".")
-f --format Sets the report file format [html (default), junit, xunit, md, xml, all, none]
-r --rules= Overrides the builtin rule set with the rules in the given XML Document
--exclude-strings Adds experimental string exclusion

The rest are excepted to be paths to input XML configuration files.


mutation_test is free software, as in "free beer" and "free speech". All Code is licensed with the BSD-3-Clause license, see file "LICENSE"

Issue tracker

You can view the issues or request features at the issue tracker.


This library provides functionality to test the quality of your automated tests via mutation testing.