Coveralls experienced a minor incident on August 20, 2025 affecting Coveralls.io Web and Coveralls.io API, lasting 6h 33m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Aug 20, 2025, 03:48 PM UTC
We are currently investigating this issue.
- investigating Aug 20, 2025, 05:11 PM UTC
We are continuing to investigate this issue.
- identified Aug 20, 2025, 05:11 PM UTC
The issue has been identified and a fix is being implemented.
- monitoring Aug 20, 2025, 07:08 PM UTC
A fix has been implemented and we are monitoring the results.
- resolved Aug 20, 2025, 10:21 PM UTC
We have implemented an intermediate solution and believe this issue has been resolved for now. To fully resolve the root cause, we will need to implement a more long-term solution, which is currently in design. (Please read postmortem for more information.) In the meantime, we will be doing our best to monitor for spikes in traffic from outlier repos and manually respond where our intermediate solution may not mitigate as much as we hope it will.
- postmortem Aug 20, 2025, 10:22 PM UTC
We believe this issue has been resolved for now. The underlying cause still appears to be large spikes in incoming Web traffic from other outlier repositories that we have not yet identified or not yet paused. **Interim Solution**: To reduce the risk of recurrence, we have applied _temporary load balancer adjustments_ that change how requests are distributed, which should _lower_—if not _eliminate_—the frequency of **503** “**This website is under heavy load**” **errors**. **Permanent Solution**: We are also designing a permanent solution to _rate-limit abnormal request patterns_. This will require coordination at the policy/SLA level before it can be fully implemented. In the meantime, we will continue to closely monitor traffic and use targeted load balancer and web server configurations to mitigate the impact of outlier traffic spikes. **More details**: For a more detailed assessment / RCA of this incident and its recent, related incidents, see [this postmortem](https://status.coveralls.io/incidents/wqbsxnzv0jsf). **Update \(Thu, Aug 21\)**: We have identified a **different permanent solution**, which does not entail changes to SLA-level details for “outlier repos.” We may still implement such a solution, but our alternative solution should avoid further 503 errors and be implemented in the next 48-72 hrs.