Replit Case Study
Flaky Tests
Jun 18, 2024
Replit’s mission is to bring the next billion software creators online. Their vision is that widespread code literacy will make the world a better place on many axes – education, wealth equality, and power distribution.
Pain Points
Before, it was normal to run CI tests on a commit or pull request 2 or 3+ times. This impacted testing culture, as engineers began to interpret CI test failures as flakes by default: it would take 20 - 30 minutes of CI time to triage flakiness vs. a real failure. For large stacks of pull requests, the engineers would need to babysit the stack and shepherd it into production, as each pull request amplified the flake problem - considerably impacting development velocity.
Solution
BuildPulse’s impact was night and day in terms of reliability in CI and developer experience working with CI. Replit is now able to triage flakes over time, distinguish from true failures, and delegate to the correct owner - leading to the number of flaky tests dropping week over week. Another outcome was better hygiene throughout their tests and test harness, which improved confidence in testing infrastructure.
Key Points
Replit’s engineering team faced substantial decrease in velocity from test flakiness in pull requests.
With BuildPulse, flakiness became trackable and addressable, increasing build stability. Replit is achieving cleaner results every night without additional noise and can distinguish flakiness from true failures.
BuildPulse helps streamline and spread awareness of flakiness, making it easier to delegate flaky tests throughout the team.
Replit reduced developer + CI time spent on flaky tests by integrating with BuildPulse.
Flaky Tests: Build vs. Buy
CI for the main web repository used to be extremely unreliable and frustrating: engineers used to have to re-run CI on most branches before they could confidently determine whether they'd broken something with their changes.
Replit leveraged multiple languages across multiple repos, and the team had limited capacity to implement an internal solution.
The key success criteria were:
Automatically identify and centrally catalog flaky tests
Metrics on flakiness and time consumed for prioritization
Surface flakiness to the broader team, and distinguish from true failures
Speed of implementation
“Thanks to BuildPulse, we were able to methodically enumerate flaky tests, prioritize them in terms of disruptive potential, drive them to 0, and keep them at 0 thanks to BuildPulse's actionable daily reports.”
Implementation
BuildPulse was up and running seamlessly - the first set of flaky tests were captured immediately. The process consisted of:
Monitoring and sending test results to BuildPulse
On-call aggressively triaging every new source of flake, creating a ticket, and assigning deflake responsibilities based on ownership
Parsing the backlog to triage historical flakiness over time
Outcome
BuildPulse has helped Replit minimize time spent collecting and triaging flaky tests, surface critical information in identifying root cause, spread awareness among the team, and improve confidence in testing infrastructure. BuildPulse will be a key piece to tracking efforts and measuring success.