Stellar incident

History lag and degraded performance on certain Horizon endpoints

Minor Resolved View vendor source →

Stellar experienced a minor incident on March 11, 2024 affecting SDF Public Network Horizon, lasting 7d 9h. The incident has been resolved; the full update timeline is below.

Started
Mar 11, 2024, 02:00 PM UTC
Resolved
Mar 18, 2024, 11:01 PM UTC
Duration
7d 9h
Detected by Pingoru
Mar 11, 2024, 02:00 PM UTC

Affected components

SDF Public Network Horizon

Update timeline

  1. investigating Mar 11, 2024, 11:34 PM UTC

    We've identified a high CPU utilization on SDF Horizon nodes due to increased request load that saturated DB connections.

  2. investigating Mar 11, 2024, 11:37 PM UTC

    We reduced global rate limits by half for Horizon.

  3. monitoring Mar 11, 2024, 11:45 PM UTC

    Reports of degraded performance in Discord.

  4. monitoring Mar 12, 2024, 12:20 AM UTC

    We flipped the standby and active endpoint. This temporarily fixed the problem.

  5. monitoring Mar 12, 2024, 12:22 AM UTC

    Performance degradation observed again.

  6. monitoring Mar 12, 2024, 12:24 AM UTC

    Additional reduction of rate limits.

  7. monitoring Mar 12, 2024, 12:29 AM UTC

    A fix has been deployed. We are monitoring to ensure it is stable.

  8. monitoring Mar 12, 2024, 06:53 PM UTC

    Fix appears to have resolved performance issues, but reduced rate limits are still in effect as we continue to monitor.

  9. resolved Mar 18, 2024, 11:01 PM UTC

    Service has remained stable over the past week and thus the incident is resolved. Note that rate limits may dynamically adjust to preserve service health in times of high network activity or volatility.