Coveralls incident

Reports of "Website under heavy load" errors

Coveralls experienced a minor incident on August 19, 2025 affecting Coveralls.io Web and Coveralls.io API, lasting 2h 4m. The incident has been resolved; the full update timeline is below.

Started: Aug 19, 2025, 03:21 PM UTC
Resolved: Aug 19, 2025, 05:26 PM UTC
Duration: 2h 4m
Detected by Pingoru: Aug 19, 2025, 03:21 PM UTC

Affected components

Coveralls.io WebCoveralls.io API

Update timeline

investigating Aug 19, 2025, 03:21 PM UTC

We are continuing to receive errors of customers receiving "This website is under heavy load" errors from our HTTP servers, even as traffic is normal. We implemented a fix last night that resolved the issue for 6-8 hrs, until we received a new report. We are investigating the issue to identify a permanent fix. In the meantime, if you receive this error, please re-try your request.
monitoring Aug 19, 2025, 04:24 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Aug 19, 2025, 05:26 PM UTC

We believe we have resolved all intermittent instances of the "This website is under heavy load" 503 error from our HTTP servers. If you should happen to receive that error from here, please let us know at: [email protected].
postmortem Aug 19, 2025, 06:28 PM UTC

### Postmortem: Reports of “Website under heavy load” errors We experienced multiple intermittent errors over the past several days before we were able to identify the true root cause and resolve the issue. **Root Cause** The errors were caused by a single outlier repository generating extremely high-volume requests \(750–1,800\+ coverage report uploads per build\). Combined with the default “sticky request” behavior in Passenger Enterprise \(which routes repeat requests from the same IP to the same HTTP server\), this overwhelmed individual servers. Once a server’s request queue was exhausted, subsequent requests returned a `503` error with the message: _“This website is under heavy load.”_ Although each server was able to process individual requests within normal timeframes, the concentrated traffic volume from a single repo and source IP could not be evenly distributed across servers. This led to repeated saturation of request queues and customer-visible errors. **Solutions Implemented** 1. We are testing new settings to override Passenger’s default “sticky request” behavior to allow requests to be distributed more evenly across servers. 2. We have paused processing for the outlier repository while we validate that the configuration changes are sufficient to prevent future incidents. **Next Steps** * Continue monitoring system performance to confirm stability. * Reintroduce the paused repository once we are confident the mitigations are effective. **Closing** We appreciate your patience as we worked through this issue. These changes are intended to permanently guard against similar incidents going forward. If you encounter unexpected errors, please contact us at [[email protected]](mailto:[email protected]). **Related incidents** 1. **Aug 13**: [Intermittent request rejections](https://status.coveralls.io/incidents/1n7plxrj8j44) 2. **Aug 14**: [Service unavailable with HTML error page or 500 errors](https://status.coveralls.io/incidents/v5mcbrsbhgt4) 3. **Aug 18**: [Reports of "Website under heavy load" errors](https://status.coveralls.io/incidents/fr6sp5kyn128) 4. **Aug 19 \(Today\)**: [Reports of "Website under heavy load" errors](https://status.coveralls.io/incidents/wqbsxnzv0jsf)