Neo4j Aura incident

Aura Professional on GCP in europe-west1 experiencing availability issues

Major Resolved View vendor source →

Neo4j Aura experienced a major incident on January 4, 2024 affecting AuraDB Professional on AWS (*.databases.neo4j.io), lasting 3h 6m. The incident has been resolved; the full update timeline is below.

Started
Jan 04, 2024, 10:46 AM UTC
Resolved
Jan 04, 2024, 01:53 PM UTC
Duration
3h 6m
Detected by Pingoru
Jan 04, 2024, 10:46 AM UTC

Affected components

AuraDB Professional on AWS (*.databases.neo4j.io)

Update timeline

  1. investigating Jan 04, 2024, 10:46 AM UTC

    We are currently investigating an issue with some databases in GCP region europe-west1 becoming intermittently unavailable.

  2. investigating Jan 04, 2024, 11:32 AM UTC

    We are continuing to investigate this issue. Affected customers can download a Dump for their database and launch a new instance in a different region (for instance europe-west2 or europe-west3) and load. We currently have no ETA for a fix but will update regularly.

  3. investigating Jan 04, 2024, 12:17 PM UTC

    We are continuing to investigate this issue. Affected customers can download a Dump for their database and launch a new instance in a different region (for instance europe-west2 or europe-west3) and load. We currently have no ETA for a fix but will update regularly.

  4. investigating Jan 04, 2024, 01:08 PM UTC

    We are continuing to investigate this issue. We have working with the cloud provider. Affected customers can download a Dump for their database and launch a new instance in a different region (for instance europe-west2 or europe-west3) and load. We currently have no ETA for a fix but will update regularly.

  5. monitoring Jan 04, 2024, 01:30 PM UTC

    We have issued a fix on our end and the issue is now addressed. We will keep monitoring a few minutes to confirm.

  6. resolved Jan 04, 2024, 01:53 PM UTC

    The incident has been fully resolved now and all databases have recovered full functionality.

  7. postmortem Jan 15, 2024, 12:13 PM UTC

    ### **What happened** Half of Aura the Professional environment on GCP in ‘europe-west1’ region experienced availability issues for their instances. As a result of an Aura component roll out, the Aura database ingress layer for two Professional tier environments: ‘europe-west1’ \(GCP\) and ‘eastus’ \(Azure\) were not automatically updating to reflect Neo4j cluster topology changes. We initially called out only the ‘europe-west1' \(GCP\) affected environment \(we provided a mitigation: use of an unaffected environment in that same region\) but missed out 'eastus’ \(Azure\). The recovery was to re-establish the connection at the database ingress level by restarting the database ingress pods in order to refresh the Neo4j cluster topology. ### **How the service was affected** There was an impact for some Aura Professional tier customers who had database instances in the affected environments, this would have been seen as intermittent unavailable during the duration of the roll out. ### **What we are doing now** We conducted a root cause analysis and we have identified a known issue with the underlying third party component used to implement the database ingress. Actions we are taking based on this incident: * Correction - we have implemented and rolled out the recommended fix for the known issue on the third party component. * Detection - Introduce logging improvements of existing logs to include items to inform when similar events are occurring and then detecting and alerting accordingly.