Neo4j Aura incident

AuraDB/DS : Some instance operations failing

Minor Resolved View vendor source →

Neo4j Aura experienced a minor incident on May 14, 2024 affecting AuraDB Virtual Dedicated Cloud on GCP (*.databases.neo4j.io) and AuraDB Professional on GCP (*.databases.neo4j.io) and 1 more component, lasting 9d. The incident has been resolved; the full update timeline is below.

Started
May 14, 2024, 04:03 PM UTC
Resolved
May 23, 2024, 04:39 PM UTC
Duration
9d
Detected by Pingoru
May 14, 2024, 04:03 PM UTC

Affected components

AuraDB Virtual Dedicated Cloud on GCP (*.databases.neo4j.io)AuraDB Professional on GCP (*.databases.neo4j.io)AuraDS Enterprise on GCP (*.databases.neo4j.io)AuraDS on GCP (*.databases.neo4j.io)AuraDB Free (*.databases.neo4j.io)

Update timeline

  1. identified May 14, 2024, 04:03 PM UTC

    We identified an issue with resuming or loading data on some instances. We are looking into the root cause.

  2. identified May 14, 2024, 07:49 PM UTC

    We have identified the issue and are working on a fix. We will continue to update you as we work this problem.

  3. identified May 14, 2024, 10:06 PM UTC

    We have identified the issue and are working on a fix. We will continue to update you as we work this problem.

  4. identified May 15, 2024, 01:09 AM UTC

    We are currently working on a fix for the issue. We will continue to update you as we work this problem.

  5. identified May 15, 2024, 04:00 AM UTC

    We have identified the issue and are working on a fix. We will continue to update you as we work on this problem.

  6. identified May 15, 2024, 07:44 AM UTC

    We are currently working on a fix for the issue. We will continue to update you as we work this problem.

  7. identified May 15, 2024, 10:19 AM UTC

    We have identified the issue and are working on a fix. We will continue to update you as we work on this problem.

  8. identified May 15, 2024, 12:37 PM UTC

    We have identified the issue and are working on a fix. We will continue to update you as we work on this problem.

  9. identified May 15, 2024, 03:51 PM UTC

    We have identified the issue and are working on a fix. We will continue to update you as we work on this problem.

  10. identified May 15, 2024, 06:45 PM UTC

    We have identified the issue and are working on a fix. We will continue to update you as we work on this problem.

  11. identified May 15, 2024, 10:20 PM UTC

    We have identified the issue and are working on a fix. We will continue to update you as we work on this problem.

  12. identified May 16, 2024, 01:41 AM UTC

    We are currently working on a fix for the issue. We will continue to update you as we work this problem.

  13. identified May 16, 2024, 07:31 AM UTC

    We have identified the issue and are working on a fix. We will continue to update you as we work on this problem.

  14. identified May 16, 2024, 10:20 AM UTC

    We are currently working on a fix for the issue. We will continue to update you as we work this problem.

  15. identified May 16, 2024, 12:56 PM UTC

    We are currently working on a fix for the issue. We will continue to update you as we work this problem.

  16. identified May 16, 2024, 04:35 PM UTC

    Some operations (Resume, Restore from backup, Loading and Clone to new) on Aura instances may experience some failure rate. We are working closely with our cloud provider on a solution.

  17. identified May 16, 2024, 07:28 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  18. identified May 16, 2024, 10:35 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  19. identified May 17, 2024, 01:02 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  20. identified May 17, 2024, 04:14 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  21. identified May 17, 2024, 07:02 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  22. identified May 17, 2024, 10:02 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  23. identified May 17, 2024, 03:04 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  24. identified May 17, 2024, 06:30 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not be completed successfully. We are working closely with our cloud provider on a solution.

  25. identified May 17, 2024, 09:36 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not be completed successfully. We are working closely with our cloud provider on a solution.

  26. identified May 17, 2024, 11:42 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not be completed successfully. We are working closely with our cloud provider on a solution.

  27. identified May 18, 2024, 02:58 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not be completed successfully. We are working closely with our cloud provider on a solution.

  28. identified May 18, 2024, 06:03 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not be completed successfully. We are working closely with our cloud provider on a solution.

  29. identified May 18, 2024, 09:04 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not be completed successfully. We are working closely with our cloud provider on a solution.

  30. identified May 18, 2024, 12:13 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not be completed successfully. We are working closely with our cloud provider on a solution.

  31. identified May 18, 2024, 03:01 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  32. identified May 18, 2024, 06:02 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  33. identified May 18, 2024, 09:00 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  34. identified May 19, 2024, 12:31 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  35. identified May 19, 2024, 03:31 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution

  36. identified May 19, 2024, 06:31 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  37. identified May 19, 2024, 09:31 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  38. identified May 19, 2024, 12:31 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  39. identified May 19, 2024, 03:30 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  40. identified May 19, 2024, 06:30 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  41. identified May 19, 2024, 09:30 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  42. identified May 20, 2024, 01:10 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  43. identified May 20, 2024, 04:01 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  44. identified May 20, 2024, 07:03 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  45. identified May 20, 2024, 10:00 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  46. identified May 20, 2024, 01:09 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  47. identified May 20, 2024, 04:40 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  48. identified May 20, 2024, 08:21 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  49. identified May 20, 2024, 11:08 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  50. identified May 21, 2024, 02:11 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  51. identified May 21, 2024, 05:00 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  52. identified May 21, 2024, 08:25 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  53. identified May 21, 2024, 11:25 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  54. identified May 21, 2024, 02:32 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  55. identified May 22, 2024, 01:01 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  56. identified May 22, 2024, 04:20 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  57. identified May 22, 2024, 07:01 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  58. identified May 22, 2024, 10:00 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  59. identified May 22, 2024, 01:00 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  60. identified May 22, 2024, 04:00 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  61. identified May 22, 2024, 06:53 PM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  62. identified May 23, 2024, 12:25 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  63. identified May 23, 2024, 03:00 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  64. identified May 23, 2024, 06:04 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  65. identified May 23, 2024, 09:00 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  66. identified May 23, 2024, 11:49 AM UTC

    Some operations (Resume, Restore from backup, Loading, and Clone to new) on Aura instances are impacted and may not complete successfully. We are working closely with our cloud provider on a solution.

  67. resolved May 23, 2024, 04:39 PM UTC

    The issue is now fixed and our cloud provider has confirmed the resolution of the underlying root cause by rolling out a change.

  68. postmortem Jun 26, 2024, 12:31 PM UTC

    ### **What happened** Our cloud provider rolled out a new feature in GKE that handles the resources relating to kubernetes snapshots. A new configuration associated with this feature led to a race condition exacerbated by concurrent calls to the resource with identical snapshot names. We worked to isolate the issue and to discard any involvement from our changes. After we got clarity, we declared this incident and engaged with our cloud service provider and reported the issue and escalated soon after as we became more confident of the nature and impact of the issue. ### **How the service was affected** Most Aura operations \(Resume, Restore from backup, Loading, and Clone to new\) were failing as they require the kubernetes snapshot functionality. Some customers may have initiated operations and those could have got stuck. We proactively monitored and assisted manually whenever possible. ### **What we are doing now** * Our provider is reviewing their testing procedure to detect better possible race conditions. * Our provider is implementing additional safeguards to prevent race conditions from occurring. * We have now established a better channel to work and escalate issues with our service provider when it is affecting our service.