Updated March 2026

How to fix flaky tests in CI

84% of CI failures are flaky, not real bugs. Here’s how to identify, classify, and fix flaky tests - without re-running and hoping.

Why this is hard to test

•Flaky tests have multiple root causes: timing issues, data non-determinism, environment drift, and third-party dependency changes
•Re-running hides the problem - teams lose trust in CI and start ignoring failures
•Quarantining without governance hides real bugs alongside flaky ones
•At scale, flake investigation consumes 20–40% of engineering time in QA-heavy organizations

Approach 1: Manual flake management

1.Track flake rate per test over 30 days - use CI analytics (Buildkite Test Analytics, CircleCI insights)
2.Classify root causes: timing (add explicit waits), data (make fixtures deterministic), environment (check staging parity)
3.Quarantine high-flake tests - remove from blocking gate, add to a “repair queue” with assigned owners
4.Set an SLA: quarantined tests must be fixed or deleted within 2 sprints
5.Monitor flake rate as a team metric - target <5% of total test transitions

Approach 2: Zerocheck run evidence

1.Approved tests run on GitHub PRs and record failure details, screenshots, recordings, and step traces
2.PR comments show which approved failures are blocking and which are non-blocking
3.Run history and trend data help engineers spot noisy tests without guessing from red/green alone
4.Suggested regression tests can be reviewed and approved after repeated failures or incidents

Flaky Test Triage →

Common pitfalls

—Don’t just re-run - every re-run that passes is a hidden flaky test you’re ignoring
—Don’t quarantine without governance - quarantined tests need owners and deadlines
—Don’t add sleep() as a fix - use explicit wait conditions (waitForSelector, waitForResponse)
—Don’t blame the framework - most flakiness comes from test design (shared state, non-deterministic data), not the tool

FAQ

What causes flaky tests?

The top causes are: timing/race conditions (tests interact before the page is ready), non-deterministic test data (random IDs, ordering), environment drift (staging differs from production), and third-party dependency changes (Stripe, OAuth providers).

How do I measure flake rate?

Track the percentage of test transitions from pass-to-fail that revert on re-run. Google found that 84% of such transitions are flaky. Most CI platforms (Buildkite, CircleCI) provide this data.

Should I delete flaky tests?

If a test has been quarantined for 2+ sprints with no fix, delete it. A permanently quarantined test provides zero value and clutters your suite. Better to have no test than a test everyone ignores.

How to fix flaky tests in CI

Start with a URL, review the suggested tests, and run the approved suite in a hosted browser.

Get a demo