RIPE Network Coordination Centre incident

High failure rate of requests to RIPEstat

RIPE Network Coordination Centre experienced a minor incident on September 16, 2025 affecting RIPEstat, lasting 3d 5h. The incident has been resolved; the full update timeline is below.

Started: Sep 16, 2025, 07:28 AM UTC
Resolved: Sep 19, 2025, 12:57 PM UTC
Duration: 3d 5h
Detected by Pingoru: Sep 16, 2025, 07:28 AM UTC

Affected components

RIPEstat

Update timeline

identified Sep 16, 2025, 07:28 AM UTC

Between ~3:10 UTC and ~5:40 there were multiple periods during which RIPEstat was fully unavailable. We still see a higher tail latency for requests. Our current analysis shows that this was the negative impact of a configuration change we deployed to mitigate the memory consumption issues we encountered last week. We hope to mitigate this fully by end of day today.
monitoring Sep 16, 2025, 10:16 AM UTC

We have applied a mitigation for one underlying issue, and hope to finish another infrastructure change by end of working day today.
monitoring Sep 16, 2025, 06:08 PM UTC

We have deployed a configuration change to part of our cluster that should prevent the performance issues from recurring. We will evaluate the performance for both groups (baseline/treatment) tomorrow and continue with the roll-out if it performs as expected.
monitoring Sep 17, 2025, 06:27 AM UTC

The configuration change we evaluated yesterday shows positive results. We applied the change to a quarter of our environment. It significantly reduced the median latency compared to the baseline. We plan to roll out this configuration change to the full environment later today. We also adjusted our rate limiting to further improve service stability. Rate limits now resulting in HTTP 429 instead of the historic behaviour of HTTP 503. The rate limit differs per endpoint. If you are affected by the rate limits, please contact us. We will continue monitoring closely during the full rollout.
monitoring Sep 17, 2025, 11:29 AM UTC

We have deployed the configuration change. We expect this has resolved the issues. We will update this status announcement tomorrow morning.
resolved Sep 19, 2025, 12:57 PM UTC

This incident has been resolved. Our latest mitigation worked as expected. Some users may encounter HTTP 429 responses for exceeding the over-all rate limit, or a new rate-limit on `bgp-updates`. Please contact us if you encounter issues.