Spreedly incident

Server Maintenance Issues Causing 500 Errors

Notice Resolved View vendor source →

Spreedly experienced a notice incident on December 11, 2024, lasting —. The incident has been resolved; the full update timeline is below.

Started
Dec 11, 2024, 08:50 PM UTC
Resolved
Dec 11, 2024, 07:30 PM UTC
Duration
Detected by Pingoru
Dec 11, 2024, 08:50 PM UTC

Update timeline

  1. resolved Dec 11, 2024, 08:50 PM UTC

    While maintaining a backend service to increase capacity, we saw intermittent errors during the timeframes of 2:28 PM EST and 2:30 PM EST and again during the timeframes of 2:39 PM EST and 2:41 PM EST. Our internal monitors picked up this issue, and we resolved it quickly. We have since seen the system return to normal, and we do not expect any further customer impact. This incident is considered resolved. We apologize for any inconvenience this may have caused.

  2. postmortem Dec 16, 2024, 07:16 PM UTC

    # December 11th, 2024 — Internal System Errors Server maintenance performed during Spreedly’s regular change window to increase capacity to an internal service caused two brief periods of service interruption to APIs that access decrypted information. ## What Happened At approximately 19:15 UTC, Spreedly Engineering initiated a capacity expansion of an internal service used for data decryption. From 19:28 to 19:30, a first wave of failed API requests during the maintenance window. A second wave of failed API requests occurred from 19:39 to 19:41 UTC when rebalancing request traffic to the capacity-expanded internal service. Service was restored at 19:41 UTC, and the system was fully operational. API calls that required decrypted data were impacted during the outage timeframes. ## Next Steps Spreedly Engineering is improving internal observability, implementing automated monitors, and investigating the use of automation for scaling capacity in the future to prevent this issue from recurring.