Ian Huet - Development Journal

The Mystery 8%

Beware the gap between code & test coverage

TLDR

A test coverage threshold can be a useful tool to drive improved test coverage. Yet it is only part of the process, it is important to be vigilant for large increases as code coverage reports are calculated on the amount of code executed, not on the amount of code tested. Sudden jumps in code coverage are an indicator of poor practice. Unchecked this is counter productive.


Amongst the many challenges in quantifying code quality code coverage stands out as one of the most easily calculated. As a result coverage is commonly considered an important indicator in software testing in terms of quality and effectiveness. However, there is a subtle but important difference between code coverage and test coverage that will result in misleading metrics and misplaced confidence.

Ahead of fully automated test quality tooling being installed I have been completing periodic test quality reviews. During one of these reviews I spotted an 8% jump in coverage. While there had been great progress made in expanding our test coverage this was exceptional - which raised a flag, it was just too good to true! And it begged the question, where did this bump come from?

After tracking back through the commit history the bump was isolated to a PR: a single, small PR. The next question was what was the exact source. In this instance it was an implementation detail. The exceptional code coverage bump had highlighted a high level component import that was not the best option to have used. The result being a considerable increase in the amount of code being imported but none of this code was being actively tested: code coverage versus test coverage.

Thankfully there was an easily completed alternative that delivered the same functionality without skewing test coverage. With this update applied the coverage returned to the expected level. Though at a reduced level it is a more accurate representation of the actual test coverage.

Take aways:

If the coverage change was downward it would likely have thrown an error. Yet an 8% increase raised no alarms at all.

References: