Minimizing flaky test impact
Oct 25, 2023
Flaky tests, or those producing inconsistent results, can be a significant thorn in the side of any software development process. They introduce non-determinism into your test suite, often leading to unreliable test results and time-consuming debugging sessions. So how do we mitigate the impact of these failed tests and prevent their recurrence? This tutorial outlines strategies to minimize their impact and enhance the functionality of your test suite.
Minimizing the Impact
Test Retry Mechanisms
One crucial approach is the implementation of test retry mechanisms, an aspect of automated testing that involves rerunning failed tests. The rerun can help determine if the test failure was an isolated occurrence or a consistent issue. This tactic buffers the impact of flaky tests, affording more opportunities for the tests to deliver consistent results, reducing unnecessary debugging, and preventing premature code changes.
Test Environment Isolation
Test environment isolation is also crucial. A common cause of test flakiness is when test cases depend on shared resources or results from previous tests. By creating a self-contained environment for each test, you can eliminate these dependencies, minimizing the likelihood of flaky tests and providing precise feedback on the functionality being tested and the root cause if a test fails.
Timeout handling can prevent false failures caused by slow or unresponsive dependencies, another common cause of flakiness. It's especially vital when tests rely on APIs or other external systems that may not always respond predictably. Adjusting timeouts based on each test's specific requirements can reduce flakiness.
Test Frameworks and Automation
Comprehensive Test Reports and Metrics
Comprehensive test reports and metrics tracking provide valuable insights into test flakiness. Detailed reports enable you to understand each test's behavior, pinpointing scenarios under which a test fails. By tracking metrics over time, patterns and trends related to test flakiness become apparent, enabling a focused approach to the most problematic areas.
Case Study: Slack’s Approach to Flaky Tests
In 2019, Slack undertook a significant effort to resolve flaky tests issues. After spending some time in identifying flaky tests, they established flakiness thresholds - if test flakiness exceeded that threshold, it would be automatically disabled. The results of this automation has tremendous impact on product stability, engineering velocity, and test confidence.
They saw a dramatic improvement in their main branch stability, increasing from 19.82% in July 2020 to 96% in February 2021. Similarly, the rate of test job failures plummeted from 56.76% to 3.85% over the same period. These enhancements boosted developer sentiment and confidence considerably.
Moreover, the automation of test triaging saved substantial developer time. In detail, it created 693 PRs for Android and 492 PRs for iOS, resulting in an overall saving of 553 hours (equivalent to 23 full days of developer time).
Feedback from developers further attested to the project's success. One developer stated, "I feel iOS CI is much more stable and fast than before. Thank you for all the hard work! It improves our productivity by far. Really appreciated!” Moreover, survey results showed that 74% of developers felt the project had positively impacted main branch stability, and 64% reported reduced reruns on PR.
Read more about the study here.
Test Detection and Reporting: Tools and Collaboration
The use of sophisticated tools and fostering collaboration within development and DevOps teams can significantly enhance the process of preventing, identifying, and reducing flaky tests.
BuildPulse offers a comprehensive platform to detect flaky tests, measure their impact on the product and the engineering team, and alert stakeholders of any issues while they’re still manageable. We go beyond reporting and give you tools take action, such as automated test quarantining, to handle the problem end-to-end.
What is the difference between a flaky test and a false positive?
A false positive is a test failure in your test suite due to an actual error in the code being executed, or a mismatch in what the test expects from the code.
A flaky test is when you have conflicting test results for the same code. For example, while running tests if you see that a test fails and passes, but the code hasn’t changed, then it’s a flaky test. There’s many causes of flakiness.
What is an example of a flaky test?
An example can be seen in growing ci cd pipelines - when pull request builds fail for changes you haven’t made.
What causes tests to be flaky?
Broken assumptions in test automation can introduce flaky tests - for example, if test data is shared between different tests, whether asynchronous or sequential, the results of one test can affect another. Poorly written test code can also be a factor - such as improper polling, event, or timeout handling for network requests or page loads. Any of these can lead to flaky test failures and test flakiness.
What is the best way to resolve or fix flaky tests?
Devops, software engineering, and software development teams will often need to compare code changes, logs, and other context from before the test instability started, and after. Adding retries can also help. Test detection and test execution tooling can help automate this process as well - BuildPulse enables you to find, assess impact metrics, quarantine, and fix flaky tests.
What are some strategies for preventing flaky tests?
Paying attention and prioritizing flaky tests as they come up can be a good way to prevent them from becoming an issue. This is where a testing culture is important - if a flaky test is spotted by an engineer, it should be logged right away. This, however, takes a certain level of hygiene - BuildPulse can provide monitoring so flaky tests are caught right away.