OpenNode incident

High database load

OpenNode experienced a major incident on May 14, 2024, lasting —. The incident has been resolved; the full update timeline is below.

Started: May 14, 2024, 07:01 PM UTC
Resolved: May 14, 2024, 05:30 AM UTC
Duration: —
Detected by Pingoru: May 14, 2024, 07:01 PM UTC

Update timeline

resolved May 14, 2024, 07:01 PM UTC

On May 14, 2024, OpenNode experienced an incident with our production database, resulting in performance degradation and service disruption. The root cause of the issue was identified as a deprecated API loading excessive data, leading to a database overload.
postmortem May 14, 2024, 07:01 PM UTC

**Summary:** On May 14, 2024, OpenNode experienced an incident with our production database, resulting in performance degradation and service disruption. The root cause of the issue was identified as a deprecated API loading excessive data, leading to a database overload. ‌ **Timeline of Events:** * **10:30 PM PST:** Issue Detected * Our monitoring systems alerted us to unusual activity within the production database, indicating potential performance issues. ‌ * **11:30 PM PST:** Service restart * A decision was made to restart auxiliary services to alleviate any immediate issues and restore stability to the system. ‌ * **12:30 AM PST:** Database Workload Issue Identified and Fix Deployed * Upon further investigation, it was determined that the root cause of the problem was a deprecated API that was loading an excessive amount of data into the database, causing an overload. * A fix was promptly developed and deployed to address the issue and mitigate further impact on system performance. ‌ * **1:30 AM PST:** Monitoring Fix and Stability * Additional measures were taken to enhance monitoring systems and ensure the stability of the database environment. ‌ * **2:00 AM PST:** Issue Confirmed Fixed * Following the deployment of the fix, monitoring systems indicated a return to normal database workload and performance levels. * Our internal testing confirmed that the issue was resolved, and normal service operations were restored.