GraphCDN incident

Stellate Services unavailable because of Cloudflare Worker KV outage

Critical Resolved View vendor source →

GraphCDN experienced a critical incident on June 7, 2023 affecting GraphQL Edge Caching and GraphQL Metrics and 1 more component, lasting 44m. The incident has been resolved; the full update timeline is below.

Started
Jun 07, 2023, 07:01 PM UTC
Resolved
Jun 07, 2023, 07:46 PM UTC
Duration
44m
Detected by Pingoru
Jun 07, 2023, 07:01 PM UTC

Affected components

GraphQL Edge CachingGraphQL MetricsGraphQL Rate LimitingGraphQL Developer PortalsUser APIAdmin API

Update timeline

  1. investigating Jun 07, 2023, 07:01 PM UTC

    We are looking into an issue with Stellate right now. We will update this incident as we have more data available.

  2. monitoring Jun 07, 2023, 07:07 PM UTC

    As far as we can tell, Cloudflare Workers KV service, which we depend on, was having a outage of about 5 to 10 minutes. They seem to be back up and running again. We are monitoring the situation and will update our status page as needed.

  3. monitoring Jun 07, 2023, 07:18 PM UTC

    All services are back up and running again. We are monitoring the status of our services as well as Cloudflare Worker KV store.

  4. resolved Jun 07, 2023, 07:46 PM UTC

    Cloudflare posted an update on their status page and marked the incident that caused this incident as resolved. See https://www.cloudflarestatus.com/incidents/1mj9jch1tqf9 for their update.

  5. postmortem Jun 08, 2023, 10:48 AM UTC

    * Stellate currently relies on CloudFlare services for parts of our offerings. * Cloudflare had a global outage of their KV store for ~10 minutes on June 7th, from 6.51 pm to 7.01 pm. They provide a summary of this incident on their own status page at [https://www.cloudflarestatus.com/incidents/1mj9jch1tqf9](https://www.cloudflarestatus.com/incidents/1mj9jch1tqf9). * Any traffic that resulted in cache misses or cache passes triggered an HTTP/500 error page during that time frame. Traffic directly handled by the edge cache \(i.e., cache hits\) was not affected. * ~30% of traffic resulted in cache hits and was served correctly. * ~70% of traffic resulted in cache misses or passes; these requests returned an HTTP/500 error. * We are currently working on a larger infrastructure improvement that will remove the dependency on Cloudflare Worker KV. * Additionally, we will review all possible failure points that could make Stellate core services inaccessible \(in the event of a third-party outage\) and investigate options for additional redundancies for those services.