Code Coverage
5 min read

Code coverage: what it actually tells you (and what it doesn't)

A 90% coverage badge doesn't mean your code is well-tested. Here's what code coverage actually measures, where it misleads you, and how to use it without fooling yourself.

BuildPulse Team

May 30, 2026

Code Coverage: What It Actually Tells You | BuildPulse Blog

The number everyone wants, and what it actually means

At some point in every engineering org's life, someone puts a coverage badge in the README and declares victory. Ninety percent covered. Ship it.

Then a bug slips through in the exact code that was "covered."

Code coverage is one of the most misread metrics in software engineering — not because it's a bad metric, but because people treat it as a proxy for test quality when it's really just a proxy for test execution. Those are very different things.

Let's talk about what coverage actually measures, where it genuinely helps, and where it will actively mislead you if you're not careful.

What code coverage actually measures

Code coverage is a measurement of which lines (or branches, or paths) in your source code are executed when your test suite runs. Nothing more.

The most common types you'll encounter:

  • Line coverage — was this line executed at least once?
  • Branch coverage — did the test exercise both the true and false path of every conditional?
  • Function coverage — was this function called at all?
  • Statement coverage — similar to line coverage, but counts individual statements rather than physical lines

Here's a simple Python example to make this concrete:

def calculate_discount(price, is_member):
    if is_member:
        return price * 0.9
    return price

If your test only calls calculate_discount(100, True), you get 100% line coverage. Every line executes. But you never tested the non-member path. Branch coverage would catch that gap — line coverage won't.

This is why branch coverage is generally more useful than line coverage, and why most mature teams report both.

How coverage is collected

Coverage tools instrument your code — either at compile time, at runtime, or via source transformation — and track which lines execute. In practice, this looks different by language:

JavaScript / TypeScript (Jest + V8 or Istanbul):

{
  "jest": {
    "collectCoverage": true,
    "coverageProvider": "v8",
    "coverageReporters": ["text", "lcov"],
    "coverageThreshold": {
      "global": {
        "lines": 80,
        "branches": 70
      }
    }
  }
}

Go (built-in):

go test ./... -coverprofile=coverage.out
go tool cover -html=coverage.out -o coverage.html

Python (pytest-cov):

pytest --cov=src --cov-report=xml --cov-report=term-missing

Ruby (SimpleCov):

# spec/spec_helper.rb
require 'simplecov'
SimpleCov.start do
  add_filter '/spec/'
  minimum_coverage 85
end

All of these produce some variant of a coverage report — often an LCOV or Cobertura XML file — that can be uploaded to a coverage tracking service or parsed in CI.

Wiring it into your CI pipeline

Collecting coverage locally is one thing. Enforcing it in CI and tracking it over time is where it actually becomes useful. A minimal GitHub Actions setup looks like this:

name: Test

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Node
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install dependencies
        run: npm ci

      - name: Run tests with coverage
        run: npm test -- --coverage

      - name: Upload coverage report
        uses: actions/upload-artifact@v4
        with:
          name: coverage
          path: coverage/lcov.info

From here, you can upload that lcov.info to a tool like BuildPulse, Codecov, or Coveralls to get per-PR coverage diffs, historical trend charts, and alerts when coverage drops.

Per-PR diffs are what make coverage actionable day-to-day. Seeing that a PR drops branch coverage from 83% to 71% tells the author something specific — and a whole lot more than a static badge ever could.

The coverage number that's lying to you

Here's where I want to be direct: high coverage does not mean well-tested code.

Consider this test:

it('runs the function', () => {
  processOrder(mockOrder);
});

No assertions. The function runs, every line executes, coverage goes up. The test is useless. It would pass even if processOrder silently swallowed every error and returned nothing. This is sometimes called "coverage theater" — the appearance of thoroughness with none of the substance.

The metric coverage can't catch:

  • Whether your assertions are correct
  • Whether the test exercises meaningful input variations
  • Whether edge cases (empty arrays, null inputs, concurrent access) are covered
  • Whether the behavior being tested is actually the behavior that matters

Mutation testing — where a tool deliberately introduces small bugs into your code and checks if your tests catch them — is a much stronger signal of test quality. Tools like Stryker (JS/TS), Pitest (Java), and mutmut (Python) run this. The tradeoff is runtime: mutation testing is slow. It's not replacing coverage; it's a complement for critical paths.

Where coverage thresholds go wrong

Setting a hard coverage threshold (fail if < 80%) sounds like good engineering hygiene. It often isn't.

The problems:

  1. It creates perverse incentives. Engineers write tests to hit the number, not to verify behavior. You end up with high coverage and low confidence.

  2. Aggregate thresholds hide rot. An 80% global threshold can hide a critical payment module sitting at 40% if other modules are at 95%. Per-directory or per-module thresholds are more honest.

  3. The threshold never goes down. Teams fight to stay above the line, but the line never increases meaningfully. It becomes a floor, not a target.

A more useful enforcement pattern: fail on coverage regressions, not on absolute thresholds. If a PR drops coverage on the files it touches, that's a signal worth blocking on. If it maintains or improves coverage, let it through.

This is the framing BuildPulse uses when surfacing coverage data on PRs — delta on the changed files matters more than the global number.

What good coverage analysis looks like in practice

Here's what I'd actually recommend, in rough order of priority:

1. Collect it everywhere, first. You can't improve what you don't measure. Get coverage running in CI before you worry about thresholds or tooling.

2. Track trends over time, not just snapshots. A coverage report on a single commit is trivia. A chart showing coverage drifting from 85% to 71% over six months is a conversation you need to have.

3. Surface per-PR diffs. This is the most actionable signal. When a PR author sees "you dropped branch coverage on src/billing/invoices.ts from 88% to 61%", they can do something about it right now.

4. Focus on branch coverage, not just line coverage. Branch coverage catches untested conditionals that line coverage misses entirely.

5. Use coverage to find untested critical paths, not to prove your code is correct. Open your coverage report, sort by lowest coverage, and ask: "is anything here important?" The answer is often yes, and that's where your next tests should go.

6. Don't confuse 100% with done. Teams that chase 100% coverage tend to test implementation details instead of behavior. Your tests end up brittle and your coverage number is still lying to you.

The mental model that makes coverage useful

Think of code coverage as a smoke detector, not a fire suppression system.

A smoke detector going off tells you something might be wrong. It doesn't put the fire out. It doesn't even confirm there's a fire — sometimes it's toast. But having no smoke detector is clearly worse than having one.

Code coverage tells you which parts of your codebase your tests didn't reach. That's genuinely useful information. When you have a bug in a section with 20% coverage, coverage data tells you why your tests didn't catch it. When you have a bug in a section with 95% coverage, coverage data tells you that coverage alone wasn't enough.

Use it for discovery and trend-tracking. Don't use it as a quality certification. And for the love of everything, don't put it in the README and call it a day.