Neo4j Aura incident

Metrics Unavailability Across All Instance Tiers

Minor Resolved View vendor source →

Neo4j Aura experienced a minor incident on May 7, 2025 affecting AuraDB Virtual Dedicated Cloud on AWS (*.databases.neo4j.io) and AuraDB Professional on AWS (*.databases.neo4j.io) and 1 more component, lasting 13h 6m. The incident has been resolved; the full update timeline is below.

Started
May 07, 2025, 08:18 AM UTC
Resolved
May 07, 2025, 09:24 PM UTC
Duration
13h 6m
Detected by Pingoru
May 07, 2025, 08:18 AM UTC

Affected components

AuraDB Virtual Dedicated Cloud on AWS (*.databases.neo4j.io)AuraDB Professional on AWS (*.databases.neo4j.io)AuraDS on AWS (*.databases.neo4j.io)AuraDS Enterprise on AWS (*.databases.neo4j.io)AuraDB Business Critical (*.databases.neo4j.io) on AWSAuraDB Virtual Dedicated Cloud on Azure (*.databases.neo4j.io)AuraDB Professional on Azure (*.databases.neo4j.io)AuraDS Enterprise on Azure (*.databases.neo4j.io)AuraDS on Azure (*.databases.neo4j.io)AuraDB Business Critical (*.databases.neo4j.io) on Azure

Update timeline

  1. investigating May 07, 2025, 08:18 AM UTC

    We have identified an issue impacting the availability of metrics on instances across all tiers. Our team is currently investigating the root cause. In the meantime, you may experience difficulties monitoring your instances or accessing metric data.

  2. investigating May 07, 2025, 08:20 AM UTC

    We have identified an issue impacting the availability of metrics on instances across all tiers. Our team is currently investigating the root cause. In the meantime, you may experience difficulties monitoring your instances or accessing metric data.

  3. investigating May 07, 2025, 10:09 AM UTC

    We are still investigating the issue affecting the availability of metrics. As a result, you may experience difficulties monitoring your instances or accessing metric data. We appreciate your patience as we work to resolve it.

  4. identified May 07, 2025, 01:43 PM UTC

    We have identified the issue and have a change that we will rollout shortly.

  5. identified May 07, 2025, 03:40 PM UTC

    We have identified the issue and have a change that we will rollout shortly.

  6. monitoring May 07, 2025, 04:47 PM UTC

    We have rolled out the fix and the metrics are available again. We will monitor for some time.

  7. resolved May 07, 2025, 09:24 PM UTC

    The fix implemented by our engineers has resolved the issue. Users can once again use Aura Metrics

  8. postmortem Jun 04, 2025, 03:51 PM UTC

    ### **What happened** As we rolled out our regular monthly release \(2025.04\), we introduced the possibility for the metric `log.appended_bytes` in the DBMS to return a negative value, which became apparent because with this version we released a new datastore version. The process caused the metrics HTTP endpoint of the Neo4j DBMS to fail. This issue was undetected because it affected only a subset of instances and it got corrected for the instances that had rolled subsequently to another component change we delivered as part of the release. The issue only occurred on instances that were not part of that restart due to the roll. This issue also prevented us from fully collecting metrics from those DBMS instances, which impacted monitoring of the instances, troubleshooting by engineers. We filtered out the metric causing the issue and rolled the affected instances to overcome the issue. ### **How the service was affected** Affected customers \(a random subset of instances across tiers\) could not retrieve any instance metrics via the endpoint [customer-metrics-api.neo4j.io](http://customer-metrics-api.neo4j.io) and this also affected the built-in metrics included in the monitoring section of the Aura console \([console.neo4j.io](http://console.neo4j.io) \) ### **What we are doing now** Following a review of the sequence of events and their impact, we have identified a number of actions to implement so that we improve the Neo4j Aura and prevent, detect, mitigate as well as better handle any similar issue. * Prevention * Improve metrics endpoint robustness: requests should not fail if 1 metric is invalid * Fix and prevent negative counters for metrics and log occurrences. * Detection * Implement an alert on metrics not being collected successfully * Mitigation, handling and troubleshooting * Improve access to raw metrics from any instances * Provide a configuration option to exclude a metric * Communication * Represent the metric endpoint on the status page and work towards automating the report of its status