Cracking the Case on Flaky Tests: Tips for Build Confidence and Seamless Upgrades

Cracking the Case on Flaky Tests: Tips for Build Confidence and Seamless Upgrades

How many times have you or someone on your team brushed off a failing build with a casual, ‘It’s fine, it’s just a flaky spec—ignore it’?

If you’re nodding in agreement, you’re not alone. It’s a scenario familiar to many of us, especially when dealing with sprawling monolithic projects and untouched code sections.

However, this attitude towards flaky tests can take a serious turn during upgrades. Upgrades rely on the capabilities of tests; if we can’t trust them, then uncertainty lingers.

Are they genuinely flaky, or have we unintentionally introduced issues during the upgrade? Has upgrading somehow made the tests flakier? We have run into both of these issues on past upgrades.

In this article we’ll give some tips on how to untangle the mystery behind flaky specs, with the aim of guiding you towards consistently passing builds. Ensuring your tests don’t have flakiness can help every upgrade become more likely to succeed.

The Unpredictable Nature of Flaky Tests

While flaky tests are often associated with integration tests using technologies like JS, Capybara-Webkit, or Selenium, other reasons are also possible. Oftentimes tests pass locally without issue, but fail CI over and over.

Common Causes of Flaky Tests:

In our experience of 100+ upgrade projects, we’ve come across flaky specs in all different types of projects. This section describes some of the most common causes of flaky specs.

Race Conditions

Flaky tests may arise from race conditions where the timing of execution influences the test outcome.

Leaked State

State leakage between tests can lead to unpredictable failures. Shared state should be prevented to minimize the impact of leaked state on the reliability of your test suite.

Network/Third-Party Dependency

External dependencies, such as network calls or third-party services, introduce an element of unpredictability. Mocking and stubbing opens a new window should always be used to create a controlled test environment.

Randomness

Ironically, randomness itself can be a cause of flaky tests. It is usually a more steady option to choose specific values when testing instead of grabbing random ones, another possibility to control the randomness. For example if you want values for the age of a person, you could say random between 0 and 99.

Fixed Time Dependency:

Tests relying on fixed time values might be sensitive to time of day or system variations.

Analyzing Flaky Tests: Asking the Right Questions

Unraveling the mystery behind flaky tests often involves asking the right questions to pinpoint potential causes. By examining patterns and considering different aspects of your test environment, you can narrow down the root of the issue.

Here are some key questions to get you started on your investigation:

Timing Patterns:

Is there any pattern to tests failing at specific times of day?

This could indicate a fixed time dependency issue, where the outcome of the test is influenced by the time it runs. Identifying such patterns helps uncover time-related vulnerabilities in your test suite.

One way to check this would be to write a script that would run the tests every few hours and log the output. This could help in identifying if there is a time pattern to the flaky tests. In our post about debugging non-deterministic specs opens a new window we go into more detail on this topic.

Test Order Sensitivity:

Do certain tests consistently fail when executed in a specific order? Test order sensitivity may reveal issues with dependencies between tests. Investigating this can provide insights into shared state problems or race conditions affecting the stability of your test suite.

A good way to investigate this is the run the tests using a specific seed number. You can grab the seed number after running the tests once, and then use it to run them in the same order again.

This can also be really useful if something is failing in CI to see if it’s also failing locally.

For minitest: rails test TESTOPTS="--seed 1234"

or

rails test --seed 1234

For rspec: rspec --seed 1234

External Dependencies and Network Connections:

Are your tests interacting with external services or APIs?

Do flaky tests coincide with periods of network instability?

External dependencies often introduce variability in test outcomes. Understanding the impact of external factors on your tests is crucial for creating a more controlled and reliable testing environment. It’s important to use mocking and stubbing opens a new window instead of true API calls.

Randomness and Seeds:

Have you ensured proper seeding for tests involving randomness?

Flaky behavior can arise if randomness is not adequately controlled. Confirming the use of seeds for random processes ensures test reproducibility and minimizes unexpected variations.

Code Changes:

Have recent code changes introduced new dependencies or altered test behavior?

Regularly reviewing code changes, especially those in proximity to failing tests, helps identify potential causes. Try to figure out when the test started failing, look back through build logs to see if something specific introduced the flakiness.

By systematically addressing these questions, you can gain valuable insights into the nature of flaky tests and take targeted actions to enhance the stability of your test suite.

Each question can serve as an investigative tool, bringing you one step closer to consistently green builds which will help make upgrades easier.

Some More Tips for Keeping Everything up to Date.

Asking the right questions to investigate your code base is not enough on it’s own. Team collaboration, monitoring, and staying on top of best practices all play a crucial roll in keeping the flaky specs at bay.

Test Maintenance Strategies:

Discuss the importance of ongoing test maintenance to prevent flakiness over time. Encourage your team to regularly review and update tests, especially after significant code changes or upgrades.

Monitoring and Alerting:

Suggest implementing monitoring and alerting systems that notify teams when a test becomes flaky. Early detection allows for timely investigation and resolution, minimizing the impact on build confidence.

Documentation and Best Practices:

Emphasize the significance of well-documented tests. Clear documentation can help developers understand the purpose of each test, making it easier to identify potential issues and troubleshoot failures.

Don’t make tests DRY simply for the sake of making them DRY. Think about decisions around drying up tests, and if it will make it more difficult for your team to figure out where problems are arising in the test suite.

Collaboration and Communication:

Highlight the importance of collaboration between developers and QA teams. Encourage open communication channels to promptly address flaky tests, share insights, and collectively work towards maintaining a reliable test suite.

Conclusion

In the world of software development, the reliability of your test suite is so important. Flaky tests, though common, should not be dismissed.

By implementing the strategies outlined in this article—addressing common causes, analyzing test patterns, and fostering a proactive testing culture—you can pave the way to consistently green builds and seamless upgrades.

The journey to build confidence is an ongoing process that requires collaboration, and a commitment to the quality of your codebase.

Take these tips, integrate them into your development workflow, and let your test suite become a rock-solid foundation for your software projects. Happy testing!

If your team is planning to do an upgrade soon tests are so important for success. We can help you understand how to get ready for an upgrade with our Roadmap to Upgrade Rails opens a new window .

Get the book