Heron Data incident

Async task processing outage

Major Resolved View vendor source →

Heron Data experienced a major incident on July 16, 2025 affecting Root and Transactions, lasting 17m. The incident has been resolved; the full update timeline is below.

Started
Jul 16, 2025, 06:38 PM UTC
Resolved
Jul 16, 2025, 06:55 PM UTC
Duration
17m
Detected by Pingoru
Jul 16, 2025, 06:38 PM UTC

Affected components

RootTransactions

Update timeline

  1. investigating Jul 16, 2025, 06:38 PM UTC

    We are having issues with GCP provisioning compute instances for async processing. Async processing is impacted

  2. identified Jul 16, 2025, 06:41 PM UTC

    GCP is denying "spot instances" for our async processing kubernetes cluster. This means that no async tasks are being processed. We are creating a new node pool that does not rely on spot instances to get us back up & running

  3. monitoring Jul 16, 2025, 06:44 PM UTC

    New node pools are up & running, tasks are running again. We are recovering.

  4. resolved Jul 16, 2025, 06:55 PM UTC

    This incident has been resolved. Our async processing uses GCP "spot" instances, and GCP unexpectedly started denying the creation of these spot instances in our kubernetes cluster. We have now created non-spot instances that will allow us to serve async tasks as normal, and have both spot and non-spot instances in place for future