GraphCDN experienced a major incident on July 22, 2024 affecting Purging API, lasting 2h 27m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jul 22, 2024, 05:40 AM UTC
We are currently looking into an issue with the Purging API.
- identified Jul 22, 2024, 07:32 AM UTC
The team has identified the issue and is currently implementing a fix.
- monitoring Jul 22, 2024, 07:38 AM UTC
A fix has been implemented and the Purging API is working as expected again. We are monitoring all systems to make sure they are working as expected.
- monitoring Jul 22, 2024, 07:45 AM UTC
We are continuing to monitor for any further issues.
- resolved Jul 22, 2024, 08:07 AM UTC
This incident has been resolved.
- postmortem Jul 23, 2024, 06:36 PM UTC
During a routine employee offboarding, we revoked that employee’s access to Fastly. Revoking their access to Fastly also revoked all access tokens that engineer created. Unfortunately, this included the central API token all our systems use to communicate with the Fastly API. This had two immediate impacts: 1. Purging started failing silently: Stellate’s purging API kept returning successful responses even though data would not be evicted from the cache. 2. Service configuration updates failing silently: Service configuration updates appeared to persist even though they were not updated in the CDN. As part of the incident response, we switched the central Fastly API token to a new token owned by a shared engineering account. Further, we will work on gaining better visibility and alerting on failure conditions with the purging API, as well as audit all tokens in use by our services to ensure they are not owned by individual engineers.