How to fix flaky tests

Flaky Tests

Aug 30, 2023

Software development is a complex process that involves writing and testing code to ensure its functionality. Flaky tests, however, can be a real roadblock in this development process. They can cause a lot of time-consuming reruns and debugging, as they produce inconsistent test results. This can result in failed tests, making it challenging for a software engineer to maintain an efficient workflow. In this guide, we'll discuss how to find and fix flaky tests in your test suite.

The Problem with Flaky Tests

Flaky tests are those that exhibit non-deterministic behavior. These are tests that yield different outcomes even when there are no changes in the code or test environment. They are problematic because they introduce uncertainty into the test results, which can interfere with the continuous integration (CI) and CD pipeline workflows. They can cause unnecessary test failures, leading to time-consuming debugging and rerunning of tests. Test flakiness can be introduced by various dependencies and other factors like concurrency, asynchronous code, and order dependency among others.

Step 1: Commit to Fixing the Problem Right Away!

The first step to deal with flaky tests is to commit to fixing the issue promptly. When a test fails, it might be tempting to simply rerun it until it passes. However, this approach doesn't address the root cause of the flakiness and can lead to more failed tests down the line. It is essential to prioritize identifying and fixing flaky tests to maintain the reliability of your test suite and to ensure the accuracy of test results.

Step 2: Find the Flaky Tests in Your Suite

The next step is identifying the flaky tests in your test suite. BuildPulse offers a comprehensive platform to find flaky tests, measure their impact on the product and the engineering team, and alert stakeholders of any issues while they’re still manageable.

You can also consider some other techniques:

Saving Debugging Information

Every time a test fails, make sure to save all relevant debugging information. This includes logs, screenshots, and any other data that could help diagnose the issue. This data can be invaluable when trying to find the root cause of the flakiness.

SSH Debugging

Some testing platforms allow for SSH debugging, which can provide a live look into the test environment when a test fails. This can help uncover hidden causes of flakiness.

A Picture Says More Than a Thousand Words

Whenever possible, use visual aids like screenshots and screen recordings. These can help illustrate what's happening at the frontend when a test fails.

Configuring Reports in the Pipeline

Configure your CI pipeline to generate detailed test reports whenever a test run completes. These reports can help identify patterns and trends related to test flakiness. Integration with platforms like GitHub can further streamline this process.

Step 3: Documenting Flaky Tests

Keeping a detailed record of flaky tests is critical. Document every instance of a flaky test, including when it occurred, the test environment, any error messages, and any attempted fixes. This documentation can be helpful for future reference and can assist in identifying common causes of flakiness.

Step 4: Determining the Cause and Fixing the Test

Once you've identified and documented the flaky tests, the next step is to determine the cause and fix the issue. Several common causes can lead to test flakiness:

Environmental Differences

Flakiness can often be attributed to differences in test environments. Make sure that the test environment closely mirrors the production environment to reduce the risk of environmental flakiness.

Non-deterministic Code

If your code relies on random factors or specific timing, it can lead to flaky tests. Try to avoid non-deterministic code whenever possible.

Asynchronous Wait

Flaky tests can occur if your test cases do not properly handle asynchronous API calls. Implementing appropriate wait times or callbacks can help alleviate this issue.

Concurrency

Concurrency issues can also cause flakiness. If your tests are not properly synchronized, they can interfere with each other and lead to flaky outcomes.

Order Dependency

Tests should be independent and not rely on the order in which they are run. If test order is causing flakiness, consider refactoring your tests to eliminate these dependencies.

Improper Assumptions

Sometimes, tests make improper assumptions about the state of the system, leading to flaky outcomes. Review your test cases to ensure they accurately represent the expected functionality.

Fixing the Test

Once the root cause of the flakiness is determined, the final step is to fix the test. This can involve refactoring the test code, adjusting timeouts, changing the order of test execution, and so on. Utilize your testing framework and implement best practices for writing reliable test code.

Conclusion

Flaky tests can pose a significant challenge in software development, leading to inefficient workflows and unreliable test results. However, by committing to addressing the issue, identifying and documenting flaky tests, determining the cause, and implementing necessary changes, flaky tests can be effectively managed. Leveraging tools for test detection and reporting, like Slack notifications for test failures, can help maintain a reliable test suite, ensuring the quality of your end-to-end testing, unit tests, and regression testing. Remember that maintaining test reliability is a continuous process and plays a crucial role in successful software development.

FAQ

What is the difference between a flaky test and a false positive?

A false positive is a test failure in your test suite due to an actual error in the code being executed, or a mismatch in what the test expects from the code.

A flaky test is when you have conflicting test results for the same code. For example, while running tests if you see that a test fails and passes, but the code hasn’t changed, then it’s a flaky test. There’s many causes of flakiness.

What is an example of a flaky test?

An example can be seen in growing test suites - when pull request builds fail for changes you haven’t made. Put differently, when you see a test pass and fail without any code change. These failed tests are flaky tests.

What are common causes of flakiness?

Broken assumptions in test automation and development process can introduce flaky tests - for example, if test data is shared between different tests whether asynchronous, high concurrency, or sequential, the results of one test can affect another. 

Poorly written test code can also be a factor. Improper polling, race conditions, improper event dependency handling, shared test data, or timeout handling for network requests or page loads. Any of these can lead to flaky test failures and test flakiness.

End-to-end tests that rely on internal API uptime can cause test flakiness and test failures.

What's the impact of flaky tests?

Flaky tests can wreck havoc on the development process - from wasted developer time from test retries, to creating bugs and product instability and missed releases, time-consuming flaky tests can grind your development process to a halt.

What is the best way to resolve or fix flaky tests?

Devops, software engineering, and software development teams will often need to compare code changes, logs, and other context across test environments from before the test instability started, and after - adding retries or reruns can also help with debugging. Test detection and test execution tooling can help automate this process as well. 

BuildPulse enables you to find, assess impact metrics, quarantine, and fix flaky tests.

What are some strategies for preventing flaky tests?

Paying attention and prioritizing flaky tests as they come up can be a good way to prevent them from becoming an issue. This is where a testing culture is important - if a flaky test case is spotted by an engineer, it should be logged right away. This, however, takes a certain level of hygiene - BuildPulse can provide monitoring so flaky tests are caught right away.

What type of tests have flaky tests?

Flaky tests can be seen across the testing process - unit tests, integration tests, end-to-end tests, UI tests, acceptance tests.

What if I don't have that many flaky tests?

Flaky tests can be stealthy - often ignored by engineers and test runs are retried, they build up until they can’t be ignored anymore. These automated tests slow down developer productivity, impact functionality, and reduce confidence in test results and test suites. Better to get ahead while it’s easy and invest in test management.

It’s also important to prevent regressions to catch flakiness early while it’s manageable.

What languages and continuous integration providers does BuildPulse work with?

BuildPulse integrates with all continuous integration providers (including GitHub Actions, BitBucket Pipelines, and more), test frameworks, and workflows.

Combat non-determinism, drive test confidence, and provide the best experience you can to your developers!

How long does implementation/integration with BuildPulse take?

Implementation/integration takes 5 minutes!

What is the difference between a flaky test and a false positive?

A false positive is a test failure in your test suite due to an actual error in the code being executed, or a mismatch in what the test expects from the code.

A flaky test is when you have conflicting test results for the same code. For example, while running tests if you see that a test fails and passes, but the code hasn’t changed, then it’s a flaky test. There’s many causes of flakiness.

What is an example of a flaky test?

An example can be seen in growing test suites - when pull request builds fail for changes you haven’t made. Put differently, when you see a test pass and fail without any code change. These failed tests are flaky tests.

What are common causes of flakiness?

Broken assumptions in test automation and development process can introduce flaky tests - for example, if test data is shared between different tests whether asynchronous, high concurrency, or sequential, the results of one test can affect another. 

Poorly written test code can also be a factor. Improper polling, race conditions, improper event dependency handling, shared test data, or timeout handling for network requests or page loads. Any of these can lead to flaky test failures and test flakiness.

End-to-end tests that rely on internal API uptime can cause test flakiness and test failures.

What's the impact of flaky tests?

Flaky tests can wreck havoc on the development process - from wasted developer time from test retries, to creating bugs and product instability and missed releases, time-consuming flaky tests can grind your development process to a halt.

What is the best way to resolve or fix flaky tests?

Devops, software engineering, and software development teams will often need to compare code changes, logs, and other context across test environments from before the test instability started, and after - adding retries or reruns can also help with debugging. Test detection and test execution tooling can help automate this process as well. 

BuildPulse enables you to find, assess impact metrics, quarantine, and fix flaky tests.

What are some strategies for preventing flaky tests?

Paying attention and prioritizing flaky tests as they come up can be a good way to prevent them from becoming an issue. This is where a testing culture is important - if a flaky test case is spotted by an engineer, it should be logged right away. This, however, takes a certain level of hygiene - BuildPulse can provide monitoring so flaky tests are caught right away.

What type of tests have flaky tests?

Flaky tests can be seen across the testing process - unit tests, integration tests, end-to-end tests, UI tests, acceptance tests.

What if I don't have that many flaky tests?

Flaky tests can be stealthy - often ignored by engineers and test runs are retried, they build up until they can’t be ignored anymore. These automated tests slow down developer productivity, impact functionality, and reduce confidence in test results and test suites. Better to get ahead while it’s easy and invest in test management.

It’s also important to prevent regressions to catch flakiness early while it’s manageable.

What languages and continuous integration providers does BuildPulse work with?

BuildPulse integrates with all continuous integration providers (including GitHub Actions, BitBucket Pipelines, and more), test frameworks, and workflows.

Combat non-determinism, drive test confidence, and provide the best experience you can to your developers!

How long does implementation/integration with BuildPulse take?

Implementation/integration takes 5 minutes!

What is the difference between a flaky test and a false positive?

A false positive is a test failure in your test suite due to an actual error in the code being executed, or a mismatch in what the test expects from the code.

A flaky test is when you have conflicting test results for the same code. For example, while running tests if you see that a test fails and passes, but the code hasn’t changed, then it’s a flaky test. There’s many causes of flakiness.

What is an example of a flaky test?

An example can be seen in growing test suites - when pull request builds fail for changes you haven’t made. Put differently, when you see a test pass and fail without any code change. These failed tests are flaky tests.

What are common causes of flakiness?

Broken assumptions in test automation and development process can introduce flaky tests - for example, if test data is shared between different tests whether asynchronous, high concurrency, or sequential, the results of one test can affect another. 

Poorly written test code can also be a factor. Improper polling, race conditions, improper event dependency handling, shared test data, or timeout handling for network requests or page loads. Any of these can lead to flaky test failures and test flakiness.

End-to-end tests that rely on internal API uptime can cause test flakiness and test failures.

What's the impact of flaky tests?

Flaky tests can wreck havoc on the development process - from wasted developer time from test retries, to creating bugs and product instability and missed releases, time-consuming flaky tests can grind your development process to a halt.

What is the best way to resolve or fix flaky tests?

Devops, software engineering, and software development teams will often need to compare code changes, logs, and other context across test environments from before the test instability started, and after - adding retries or reruns can also help with debugging. Test detection and test execution tooling can help automate this process as well. 

BuildPulse enables you to find, assess impact metrics, quarantine, and fix flaky tests.

What are some strategies for preventing flaky tests?

Paying attention and prioritizing flaky tests as they come up can be a good way to prevent them from becoming an issue. This is where a testing culture is important - if a flaky test case is spotted by an engineer, it should be logged right away. This, however, takes a certain level of hygiene - BuildPulse can provide monitoring so flaky tests are caught right away.

What type of tests have flaky tests?

Flaky tests can be seen across the testing process - unit tests, integration tests, end-to-end tests, UI tests, acceptance tests.

What if I don't have that many flaky tests?

Flaky tests can be stealthy - often ignored by engineers and test runs are retried, they build up until they can’t be ignored anymore. These automated tests slow down developer productivity, impact functionality, and reduce confidence in test results and test suites. Better to get ahead while it’s easy and invest in test management.

It’s also important to prevent regressions to catch flakiness early while it’s manageable.

What languages and continuous integration providers does BuildPulse work with?

BuildPulse integrates with all continuous integration providers (including GitHub Actions, BitBucket Pipelines, and more), test frameworks, and workflows.

Combat non-determinism, drive test confidence, and provide the best experience you can to your developers!

How long does implementation/integration with BuildPulse take?

Implementation/integration takes 5 minutes!

Ready for Takeoff?

Ready for Takeoff?

Ready for Takeoff?