Neo4j Aura experienced a minor incident on June 28, 2024 affecting AuraDB Virtual Dedicated Cloud on AWS (*.databases.neo4j.io) and AuraDB Professional on AWS (*.databases.neo4j.io) and 1 more component, lasting 10h 55m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- identified Jun 28, 2024, 01:58 PM UTC
Aura 5 latest update has introduced a query regression Error Signature: key not found: VariableSlotKey(ke) Workaround: prefixing the queries with runtime=legacy should address the issue
- identified Jun 28, 2024, 03:29 PM UTC
We continue to work on a fix for this issue. To identify if you are affected - Error Signature: key not found: VariableSlotKey(...) We are taking steps to contact affected customers.
- identified Jun 28, 2024, 05:05 PM UTC
We have a fix and are currently ongoing packaging and release of that fix. The ETA for the rollout on Aura is currently 22:00 UTC
- identified Jun 28, 2024, 09:08 PM UTC
We are continuing to work on a fix for this issue.
- identified Jun 28, 2024, 09:29 PM UTC
We have a fix and are currently progressing with the release of that fix. The packaging and testing ran longer than earlier estimates. The current projected ETA for the rollout on Aura is currently June 29th 00:00 UTC
- resolved Jun 29, 2024, 12:54 AM UTC
We have now completed the roll out of the fix to all affected Aura instances. incident resolved
- postmortem Jul 12, 2024, 06:26 PM UTC
### **What happened** On 2024-06-28 10:46 \(UTC\) we released a new version of the database \(v. 5.21\). This release of the database contained an improvement to fix a problem of queries returning an entity ordered by an indexed-backed property that could come out in the wrong order due to concurrent writes \(eg with an index-backed property: `MATCH (n:L) WHERE n.x IS NOT NULL RETURN n ORDER BY n.x`\) Unfortunately this introduced two undetected regressions: * `key not found: VariableSlotKey(...)` where the runtime would try to access a cached variable that was not in scope * `NullCheckReferenceProperty cannot be cast to class ASTCachedProperty` a class cast exception in the slotted runtime happening in some cases when retrieving cached properties Contrary to the usual process, a human error meant it was rolled-out directly to the AuraDB Enterprise tier without having been exposed first to the AuraDB Free tier and Professional tier. ### **How the service was affected** Customers running certain types of queries where the whole node would get cached as part of the optimization were seeing failures `key not found: VariableSlotKey(...)` OR `NullCheckReferenceProperty cannot be cast to class ASTCachedProperty` with no simple workarounds. We fixed the regression and rolled out a new version 5.21.1 \(2024-06-29 01:10\) and then a second 5.21.2 to address both of these regressions. ### **What we are doing now** We consider this incident as severe and have committed resources to deliver on the following actions: * Rolling out * Prevent rolling out to the Enterprise tier at the end of the working week * Release roll-out process and tooling to enforce the sequence of tiers to receive a new release * Automate the release and roll-out process to remove risks of further human error * Improve on the time it takes to release an emergency fix * Database release * Add further tests around the release of improvements to Cypher * Detection * Build a monitoring dashboard specific to detect spikes of errors due to Cypher queries