Minimizing flaky test impact

Flaky Tests

Oct 25, 2023

Flaky tests, or those producing inconsistent results, can be a significant thorn in the side of any software development process. They introduce non-determinism into your test suite, often leading to unreliable test results and time-consuming debugging sessions. So how do we mitigate the impact of these failed tests and prevent their recurrence? This tutorial outlines strategies to minimize their impact and enhance the functionality of your test suite.

Minimizing the Impact

Test Retry Mechanisms

One crucial approach is the implementation of test retry mechanisms, an aspect of automated testing that involves rerunning failed tests. The rerun can help determine if the test failure was an isolated occurrence or a consistent issue. This tactic buffers the impact of flaky tests, affording more opportunities for the tests to deliver consistent results, reducing unnecessary debugging, and preventing premature code changes.

Test Environment Isolation

Test environment isolation is also crucial. A common cause of test flakiness is when test cases depend on shared resources or results from previous tests. By creating a self-contained environment for each test, you can eliminate these dependencies, minimizing the likelihood of flaky tests and providing precise feedback on the functionality being tested and the root cause if a test fails.

Timeout Handling

Timeout handling can prevent false failures caused by slow or unresponsive dependencies, another common cause of flakiness. It's especially vital when tests rely on APIs or other external systems that may not always respond predictably. Adjusting timeouts based on each test's specific requirements can reduce flakiness.

Test Frameworks and Automation

The selection of a testing framework and implementation of test automation can substantially influence your test results. A robust testing framework, tailored to your application's specifics (be it JavaScript, HTML, frontend, or end-to-end tests), can mitigate flaky tests' occurrence. By running tests across different configurations, issues involving test order or order dependencies will be immediately obvious. Furthermore, test automation can increase the consistency of test runs, eliminating human error and reducing the chance of introducing flakiness.

Comprehensive Test Reports and Metrics

Comprehensive test reports and metrics tracking provide valuable insights into test flakiness. Detailed reports enable you to understand each test's behavior, pinpointing scenarios under which a test fails. By tracking metrics over time, patterns and trends related to test flakiness become apparent, enabling a focused approach to the most problematic areas.

Case Study: Slack’s Approach to Flaky Tests

In 2019, Slack undertook a significant effort to resolve flaky tests issues. After spending some time in identifying flaky tests, they established flakiness thresholds - if test flakiness exceeded that threshold, it would be automatically disabled. The results of this automation has tremendous impact on product stability, engineering velocity, and test confidence.

They saw a dramatic improvement in their main branch stability, increasing from 19.82% in July 2020 to 96% in February 2021. Similarly, the rate of test job failures plummeted from 56.76% to 3.85% over the same period. These enhancements boosted developer sentiment and confidence considerably.

Moreover, the automation of test triaging saved substantial developer time. In detail, it created 693 PRs for Android and 492 PRs for iOS, resulting in an overall saving of 553 hours (equivalent to 23 full days of developer time).

Feedback from developers further attested to the project's success. One developer stated, "I feel iOS CI is much more stable and fast than before. Thank you for all the hard work! It improves our productivity by far. Really appreciated!” Moreover, survey results showed that 74% of developers felt the project had positively impacted main branch stability, and 64% reported reduced reruns on PR.

Read more about the study here.

Test Detection and Reporting: Tools and Collaboration

The use of sophisticated tools and fostering collaboration within development and DevOps teams can significantly enhance the process of preventing, identifying, and reducing flaky tests.

BuildPulse offers a comprehensive platform to find and flaky tests, measure their impact on the engineering team, and alert stakeholders of any issues while they’re still manageable. We go beyond reporting and give you tools take action, such as automated test quarantining, to handle the problem end-to-end.

FAQ

What is the difference between a flaky test and a false positive?

A false positive is a test failure in your test suite due to an actual error in the code being executed, or a mismatch in what the test expects from the code.

A flaky test is when you have conflicting test results for the same code. For example, while running tests if you see that a test fails and passes, but the code hasn’t changed, then it’s a flaky test. There’s many causes of flakiness.

What is an example of a flaky test?

An example can be seen in growing test suites - when pull request builds fail for changes you haven’t made. Put differently, when you see a test pass and fail without any code change. These failed tests are flaky tests.

What are common causes of flakiness?

Broken assumptions in test automation and development process can introduce flaky tests - for example, if test data is shared between different tests whether asynchronous, high concurrency, or sequential, the results of one test can affect another. 

Poorly written test code can also be a factor. Improper polling, race conditions, improper event dependency handling, shared test data, or timeout handling for network requests or page loads. Any of these can lead to flaky test failures and test flakiness.

End-to-end tests that rely on internal API uptime can cause test flakiness and test failures.

What's the impact of flaky tests?

Flaky tests can wreck havoc on the development process - from wasted developer time from test retries, to creating bugs and product instability and missed releases, time-consuming flaky tests can grind your development process to a halt.

What is the best way to resolve or fix flaky tests?

Devops, software engineering, and software development teams will often need to compare code changes, logs, and other context across test environments from before the test instability started, and after - adding retries or reruns can also help with debugging. Test detection and test execution tooling can help automate this process as well. 

BuildPulse enables you to find, assess impact metrics, quarantine, and fix flaky tests.

What are some strategies for preventing flaky tests?

Paying attention and prioritizing flaky tests as they come up can be a good way to prevent them from becoming an issue. This is where a testing culture is important - if a flaky test case is spotted by an engineer, it should be logged right away. This, however, takes a certain level of hygiene - BuildPulse can provide monitoring so flaky tests are caught right away.

What type of tests have flaky tests?

Flaky tests can be seen across the testing process - unit tests, integration tests, end-to-end tests, UI tests, acceptance tests.

What if I don't have that many flaky tests?

Flaky tests can be stealthy - often ignored by engineers and test runs are retried, they build up until they can’t be ignored anymore. These automated tests slow down developer productivity, impact functionality, and reduce confidence in test results and test suites. Better to get ahead while it’s easy and invest in test management.

It’s also important to prevent regressions to catch flakiness early while it’s manageable.

What languages and continuous integration providers does BuildPulse work with?

BuildPulse integrates with all continuous integration providers (including GitHub Actions, BitBucket Pipelines, and more), test frameworks, and workflows.

Combat non-determinism, drive test confidence, and provide the best experience you can to your developers!

How long does implementation/integration with BuildPulse take?

Implementation/integration takes 5 minutes!

Ready for Takeoff?

Ready for Takeoff?

Ready for Takeoff?