What are flaky tests?

Flaky Tests

Aug 25, 2023

Flaky tests refer to test cases that both pass and fail nondeterministically. Despite having unaltered code and testing conditions, these tests yield varying results. This characteristic of flaky tests—producing inconsistent outcomes—leads to debugging challenges and can considerably undermine the confidence that developers place in their test infrastructure. Consequently, timely identification and resolution of flaky tests becomes an integral aspect of maintaining the reliability and efficiency of your testing process.

Significant Impact and the Subsequent Challenges

Flaky tests are capable of introducing substantial challenges in the development workflow. Initially, they contribute to unnecessary expenditure of time and effort as developers scramble to debug failed tests, only to realize these failures are not indicative of any substantial issues with the functionality. Developers, realizing that their code changes are not causing the failures, often resort to re-running the tests until they yield a passing result.

However, this cycle of constantly re-running failed tests, coupled with the non-deterministic nature of flaky tests, complicates the process of root cause analysis. This scenario can also introduce unnecessary failure points in Continuous Integration (CI) pipelines, due to the intermittent flakiness, thereby affecting test effectiveness and the overall development process. Automated tests form a crucial component of CI workflows. They ensure that code changes are meticulously validated. However, the introduction of flaky tests can lead to false positives and negatives, resulting in unreliable outcomes. These issues can create bottlenecks in the engineering process, reduce product stability, and affect release readiness. As a result, it is paramount to swiftly address any flakiness to prevent failed test runs and delayed deployments.

Common Causes Behind Flaky Tests

Flaky tests can be attributed to a myriad of factors and dependencies within the testing environment. Some of the most commonly observed causes include:

  • Poorly written tests: Tests that are based on insufficient assumptions or lack adequate test enforcement can lead to flakiness. If the test cases do not accurately depict the expected functionality, it can result in unpredictable outcomes.

  • Asynchronous operations: Test flakiness can arise when the tests involve asynchronous operations that rely heavily on timing. For instance, waiting for API responses or handling front-end interactions involving asynchronous JavaScript code can introduce elements of non-determinism.

  • Test order dependency: Ideally, tests should be independent and should not rely on the order of execution or shared resources. If a test case presupposes a specific execution order or alters shared data, it can lead to inconsistent test outcomes.

  • Concurrency issues: Tests that involve concurrency can be prone to flakiness. If the assumptions about the order of operations performed by different threads are incorrect, it could make the test outcomes non-deterministic.

Impact on DOM and End-to-End Testing

Flaky tests can cause particular issues when conducting end-to-end testing that involves the Document Object Model (DOM). Interactions with HTML and JavaScript can introduce elements of non-determinism, thereby causing flakiness in test outcomes. It is therefore imperative to address this challenge by thoughtfully designing test cases, incorporating suitable wait times, and utilizing specialized testing frameworks specifically designed for end-to-end scenarios.

Role in Unit Tests and Regression Testing

Flaky tests can significantly undermine the effectiveness of unit tests, which play a crucial role in identifying issues at the code level. Flakiness in unit tests can lead to false positives or missed regressions. Therefore, prioritizing regression testing and ensuring the reliability of unit tests is essential to maintain the stability and quality of your codebase.

Ensuring this never becomes a problem is critical - BuildPulse can help you find, report impact, quarantine, and ultimately help fix flaky tests.


FAQ

What is the difference between a flaky test and a false positive?

A false positive is a test failure in your test suite due to an actual error in the code being executed, or a mismatch in what the test expects from the code.

A flaky test is when you have conflicting test results for the same code. For example, while running tests if you see that a test fails and passes, but the code hasn’t changed, then it’s a flaky test. There’s many causes of flakiness.

What is an example of a flaky test?

An example can be seen in growing test suites - when pull request builds fail for changes you haven’t made. Put differently, when you see a test pass and fail without any code change. These failed tests are flaky tests.

What are common causes of flakiness?

Broken assumptions in test automation and development process can introduce flaky tests - for example, if test data is shared between different tests whether asynchronous, high concurrency, or sequential, the results of one test can affect another. 

Poorly written test code can also be a factor. Improper polling, race conditions, improper event dependency handling, shared test data, or timeout handling for network requests or page loads. Any of these can lead to flaky test failures and test flakiness.

End-to-end tests that rely on internal API uptime can cause test flakiness and test failures.

What's the impact of flaky tests?

Flaky tests can wreck havoc on the development process - from wasted developer time from test retries, to creating bugs and product instability and missed releases, time-consuming flaky tests can grind your development process to a halt.

What is the best way to resolve or fix flaky tests?

Devops, software engineering, and software development teams will often need to compare code changes, logs, and other context across test environments from before the test instability started, and after - adding retries or reruns can also help with debugging. Test detection and test execution tooling can help automate this process as well. 

BuildPulse enables you to find, assess impact metrics, quarantine, and fix flaky tests.

What are some strategies for preventing flaky tests?

Paying attention and prioritizing flaky tests as they come up can be a good way to prevent them from becoming an issue. This is where a testing culture is important - if a flaky test case is spotted by an engineer, it should be logged right away. This, however, takes a certain level of hygiene - BuildPulse can provide monitoring so flaky tests are caught right away.

What type of tests have flaky tests?

Flaky tests can be seen across the testing process - unit tests, integration tests, end-to-end tests, UI tests, acceptance tests.

What if I don't have that many flaky tests?

Flaky tests can be stealthy - often ignored by engineers and test runs are retried, they build up until they can’t be ignored anymore. These automated tests slow down developer productivity, impact functionality, and reduce confidence in test results and test suites. Better to get ahead while it’s easy and invest in test management.

It’s also important to prevent regressions to catch flakiness early while it’s manageable.

What languages and continuous integration providers does BuildPulse work with?

BuildPulse integrates with all continuous integration providers (including GitHub Actions, BitBucket Pipelines, and more), test frameworks, and workflows.

Combat non-determinism, drive test confidence, and provide the best experience you can to your developers!

How long does implementation/integration with BuildPulse take?

Implementation/integration takes 5 minutes!

Ready for Takeoff?

Ready for Takeoff?

Ready for Takeoff?