Seravo incident

Availability issues in multiple clusters

Major Resolved View vendor source →

Seravo experienced a major incident on January 28, 2026 affecting fi-coltrane cluster and fi-ellington cluster and 1 more component, lasting 4h 38m. The incident has been resolved; the full update timeline is below.

Started
Jan 28, 2026, 05:28 AM UTC
Resolved
Jan 28, 2026, 10:07 AM UTC
Duration
4h 38m
Detected by Pingoru
Jan 28, 2026, 05:28 AM UTC

Affected components

fi-coltrane clusterfi-ellington clusterfi-haarla clusterfi-metheny clusterfi-perko clusterfi-rantala clusterfi-sestak clusterfi-tolonen clusterfi-lahti clusterfi-fredriksson cluster

Update timeline

  1. identified Jan 28, 2026, 05:28 AM UTC

    Theres an availability issues in multiple clusters which affect small amount of sites. We have identified the problem and working on a solution at the moment.

  2. identified Jan 28, 2026, 06:57 AM UTC

    We are continuing to work on a fix for this issue.

  3. monitoring Jan 28, 2026, 08:57 AM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved Jan 28, 2026, 10:07 AM UTC

    This incident has been resolved.

  5. postmortem Jan 30, 2026, 01:03 PM UTC

    # Notice of a Service Disruption on January 28, 2026 On Wednesday January 28, 2026, a failure related to the Nginx web server occurred in several Seravo server clusters. The disruption was caused by an improvement on our security features and a minor change in Nginx configurations. This caused an outage to a number of sites in the clusters. The problem was noticed immediately by our on-call monitoring processes, and was fixed as quickly as possible.Technical complications during the roll-back delayed the full restoration of site services. The disruption began at 07:00 \(UTC\+2\) and ended at 10:00. We apologise for any inconvenience caused by the disruption. ## Timeline \(all timestamps UTC\+2 \(EET\) * 28.1.2026 07:00 Nginx misconfiguration was rolled-out * 28.1.2026 07:06 Site monitoring noticed site issues and alerted on-call officer * 28.1.2026 07:10 Investigation was started * 28.1.2026 07:21 The root cause was identified * 28.1.2026 07:21 Problem was escalated to the systems administration team * 28.1.2026 07:34 The team started fixing the issue * 28.1.2026 07:49 Some of the sites were recovered * 28.1.2026 09:06 Almost the whole cluster was fixed * 28.1.2026 10:00 All affected sites were back online ## Follow-Up Action As a result of the incident, we at Seravo have identified the need for the following measures: * Further development of internal processes and tools to enable faster troubleshooting and recovery. * Improvements to automated testing for configuration changes.