Labrador CMS incident

Degraded front performance

Labrador CMS experienced a major incident on May 27, 2025 affecting Labrador Editor and Labrador Frontend, lasting 43m. The incident has been resolved; the full update timeline is below.

Started: May 27, 2025, 08:20 PM UTC
Resolved: May 27, 2025, 09:03 PM UTC
Duration: 43m
Detected by Pingoru: May 27, 2025, 08:20 PM UTC

Affected components

Labrador EditorLabrador Frontend

Update timeline

investigating May 27, 2025, 08:20 PM UTC

We are currently experiencing service disruptions. We are looking into the issue and will update you as soon as we know more.
identified May 27, 2025, 08:45 PM UTC

The issue is identified to be caused by network issues at our server provider. They are actively working on resolving the issues, more information can be found here: https://network.status-ovhcloud.com/incidents/3y7hl8330d7q
monitoring May 27, 2025, 08:51 PM UTC

The network is stabilizing, but we are still experiencing slower response times. We will continue monitoring.
resolved May 27, 2025, 09:03 PM UTC

The incident is resolved and all systems are back to normal operation for now.
postmortem Jun 02, 2025, 01:14 PM UTC

## Summary On Tuesday 27.05.2025 between 22:02 - 22:50 CEST and Wednesday 28.05.2025 between 18:50 - 19:27 CEST one of our infrastructure providers, OVH, experienced network disruptions affecting two of our three data centers. This resulted in a partial or complete service outage for traffic destined to both Labrador CMS and Labrador Front. Following the incident on Wednesday a subset of our customers experienced further service degradation, as some Labrador components were stuck in an unhealthy state and needed manual intervention. At 20:30 CEST all Labrador services returned to a fully operational state. ## Details Our internal monitoring systems reported unavailable services and sites on Tuesday 22:03 CEST. Initial investigation revealed that a network outage was ongoing at OVH in several data centers, causing connectivity disruptions between our services. Networks returned to normal at 22:50 CEST. This incident reoccurred on Wednesday between 18:50 - 19:27 CEST. Once network operation was restored, internal monitoring and customer reports indicated that a subset of our clients still had service degradation, in the form of slower responses or issues publishing content. Further diagnostics revealed that one of our servers was stuck in an unhealthy state following the network incident. The affected server was pulled from our cluster, and Labrador services returned to a fully healthy state at 20:30 CEST. ## Impacted services Services affected by this incident are specified in the table below. | **Service name** | **Minutes** | **Time from — to** | | --- | --- | --- | | Labrador CMS | 48 | 22:02 — 22:50 | | Labrador CMS | 100 | 18:50 — 20:30 | | Labrador Front | 48 | 22:02 — 22:50 | | Labrador Front | 100 | 18:50 — 20:30 | ## Incident timeline Following is a timeline that describes the entire incident handling process. **Tuesday:** * `2025.05.27 22:02` Initial service outage alerts registered * `2025.05.27 22:15` Large scale network outage confirmed * `2025.05.27 22:36` Network incident confirmed by OVH * `2025.05.27 22:45` Root cause resolved by OVH, network declared healthy * `2025.05.27 22:50` All Labrador services healthy **Wednesday:** * `2025.05.28 18:53` Initial service outage alerts registered * `2025.05.28 19:00` Large scale network outage confirmed * `2025.05.28 19:03` Network incident confirmed by OVH * `2025.05.28 19:24` Root cause resolved by OVH, network declared healthy * `2025.05.28 19:45` Reports that some customers still experienced degradation * `2025.05.28 20:15` Service degradation root cause identified * `2025.05.28 20:30` Affected services restarted, fully operational ## Root cause The root cause of the incidents was determined to be backbone networking malfunctions, resulting in network disruptions at OVH. * [Tuesday: OVH incident summary](https://network.status-ovhcloud.com/incidents/3y7hl8330d7q) * [Wednesday: OVH incident summary](https://network.status-ovhcloud.com/incidents/j292y56ckyfq) ## Planned actions We will be assessing how these two incidents affect our SLA coverage. We are continuously working on improving and decentralizing our infrastructure so that we are less vulnerable to large scale data center network outages. One of our current largest efforts in this regard is moving the Labrador CMS and Front infrastructure to the cloud, reducing our exposure to various OVH outages and increasing our geographical presence and flexibility. At the moment around 25% of our customers have Labrador Front migrated fully to our AWS cloud. These customer were much less impacted by this incident, as only their Labrador CMS Editor was affected. We expect to migrate most of our remaining customers to AWS in the following weeks, and the larger or more complex customers over the summer.