Uploadcare incident

Increased REST API error rates

Notice Resolved View vendor source →

Uploadcare experienced a notice incident on February 25, 2018, lasting —. The incident has been resolved; the full update timeline is below.

Started
Feb 25, 2018, 12:00 PM UTC
Resolved
Feb 25, 2018, 12:00 PM UTC
Duration
Detected by Pingoru
Feb 25, 2018, 12:00 PM UTC

Update timeline

  1. resolved Feb 28, 2018, 02:19 PM UTC

    We've encountered increased error rates on our REST API endpoints. This resulted in reduced reported uptime. In fact, even though the uptime suffered it wasn't as bad as reported. What happened: - from February 25 22:00 UTC to February 26 05:50 UTC error rates on REST API endpoints were increased Why that happened: - one of the machines in REST API fleet ran out of memory - due to OOM, the machine was unable to handle any incoming requests - misconfigured health check prevented load balancer from getting rid of the failing machine - part of all requests, including Pingdom (that reports our uptime) ones, was sent to that failing machine What we've done: - tracked down and terminated the failing machine - fixed health check configuration