Flaky Tests
2 min read

Replit Case Study

Flaky Tests: How Replit crushes flakiness with BuildPulse

BuildPulse Team

April 30, 2026

Replit Case Study | BuildPulse Blog

Pain Points

Before, it was normal to run CI tests on a commit or pull request 2 or 3+ times. This impacted testing culture, as engineers began to interpret CI test failures as flakes by default: it would take 20 - 30 minutes of CI time to triage flakiness vs. a real failure. For large stacks of pull requests, the engineers would need to babysit the stack and shepherd it into production, as each pull request amplified the flake problem - considerably impacting development velocity.

Solution

BuildPulse’s impact was night and day in terms of reliability in CI and developer experience working with CI. Replit is now able to triage flakes over time, distinguish from true failures, and delegate to the correct owner - leading to the number of flaky tests dropping week over week. Another outcome was better hygiene throughout their tests and test harness, which improved confidence in testing infrastructure.

Key Points

  • Replit’s engineering team faced substantial decrease in velocity from test flakiness in pull requests.

  • With BuildPulse, flakiness became trackable and addressable, increasing build stability. Replit is achieving cleaner results every night without additional noise and can distinguish flakiness from true failures.

  • BuildPulse helps streamline and spread awareness of flakiness, making it easier to delegate flaky tests throughout the team.

Replit reduced developer + CI time spent on flaky tests by integrating with BuildPulse.

Flaky Tests: Build vs. Buy

CI for the main web repository used to be extremely unreliable and frustrating: engineers used to have to re-run CI on most branches before they could confidently determine whether they'd broken something with their changes.

Replit leveraged multiple languages across multiple repos, and the team had limited capacity to implement an internal solution.

The key success criteria were:

  • Automatically identify and centrally catalog flaky tests

  • Metrics on flakiness and time consumed for prioritization

  • Surface flakiness to the broader team, and distinguish from true failures

  • Speed of implementation

“Thanks to BuildPulse, we were able to methodically enumerate flaky tests, prioritize them in terms of disruptive potential, drive them to 0, and keep them at 0 thanks to BuildPulse's actionable daily reports.”

Implementation

BuildPulse was up and running seamlessly - the first set of flaky tests were captured immediately. The process consisted of:

  • Monitoring and sending test results to BuildPulse

  • On-call aggressively triaging every new source of flake, creating a ticket, and assigning deflake responsibilities based on ownership

  • Parsing the backlog to triage historical flakiness over time

Outcome

BuildPulse has helped Replit minimize time spent collecting and triaging flaky tests, surface critical information in identifying root cause, spread awareness among the team, and improve confidence in testing infrastructure. BuildPulse will be a key piece to tracking efforts and measuring success.