RIPE Network Coordination Centre incident

High latency and error rate for RIPEstat

RIPE Network Coordination Centre experienced a major incident on September 12, 2025 affecting RIPEstat, lasting 5h 51m. The incident has been resolved; the full update timeline is below.

Started: Sep 12, 2025, 03:55 AM UTC
Resolved: Sep 12, 2025, 09:46 AM UTC
Duration: 5h 51m
Detected by Pingoru: Sep 12, 2025, 03:55 AM UTC

Affected components

RIPEstat

Update timeline

identified Sep 12, 2025, 03:55 AM UTC

We see a siginifcant increase in error rate and latency since ~3:15 UTC. We are working on the solution.
monitoring Sep 12, 2025, 03:56 AM UTC

We did an intervention and are monitoring the results.
identified Sep 12, 2025, 04:37 AM UTC

Due to a bug in the salt version we are using (that causes excessive memory usage), we can not currently run our configuration management on machines that are already swapping. Attempting to manually restart processes did not work due to the load polarising to the first machines that are healthy again. We are working on a workaround.
identified Sep 12, 2025, 05:51 AM UTC

We are continuing to work on the issue. As part of our mitigation we are starting to enforce our API rate-limits on specific endpoints. We do not have contact details for the affected API user.
identified Sep 12, 2025, 05:52 AM UTC

We are continuing to work on a fix for this issue.
monitoring Sep 12, 2025, 05:52 AM UTC

A fix has been implemented and we are monitoring the results.
monitoring Sep 12, 2025, 06:03 AM UTC

We are continuing to monitor for any further issues.
resolved Sep 12, 2025, 09:46 AM UTC

The issues recovered after our last mitigations. We will make some of these changes permanent.