Labrador CMS incident

Major infrastructure network disruptions

Labrador CMS experienced a critical incident on September 20, 2022 affecting Labrador Editor and Labdevs Development and 1 more component, lasting —. The incident has been resolved; the full update timeline is below.

Started: Sep 20, 2022, 12:03 PM UTC
Resolved: Sep 20, 2022, 12:03 PM UTC
Duration: —
Detected by Pingoru: Sep 20, 2022, 12:03 PM UTC

Affected components

Labrador EditorLabdevs DevelopmentLabrador Frontend

Update timeline

investigating Sep 20, 2022, 11:44 AM UTC

Our infrastructure provider is currently experiencing major network disruptions across all data centres. This affects all Labrador services. We will update as soon as we know more.
resolved Sep 20, 2022, 12:03 PM UTC

The incident has been resolved by our infrastructure provider, all Labrador services should now be operating normally again. Related: https://network.status-ovhcloud.com/incidents/5mldyhd6v99c
postmortem Sep 22, 2022, 12:17 PM UTC

## Summary On Tuesday 20.09.2022 between 13:35 - 13:55 CEST our primary infrastructure provider, OVH, experienced major network disruptions across multiple data centers. This resulted in a partial or complete service outage for a large part of the traffic destined to both Labrador CMS and Labrador Front. At 13:55 CEST all Labrador services returned to a healthy state. ## Details Our internal monitoring systems reported the first unavailable services and sites at 13:36 CEST. Initial investigation revealed that a large scale network outage was ongoing, since none of our three data centers were responsive. On-site OVH technicians confirmed networking degradation affecting their data centers, caused by a configuration change related to an upgrade of their networking infrastructure. The faulty configuration was identified and rolled back. Networks returned online at 13:50 CEST, with all Labrador services returning to normal at 13:55 CEST. ## Impacted services Services affected by this incident are specified in the table below. | **Service name** | **Minutes** | **Time from — to** | | --- | --- | --- | | Labrador CMS | 20 | 13:35 — 13:55 | | Labrador Front | 20 | 13:35 — 13:55 | | Labdevs Development | 20 | 13:35 — 13:55 | ## Incident timeline Following is a timeline that describes the entire incident handling process. * `2022.09.20 13:36` Service outage alerts registered * `2022.09.20 13:40` Network outage confirmed by OVH * `2022.09.20 13:46` Fix implemented and pushed by OVH * `2022.09.20 13:50` Services back online, most traffic normal * `2022.09.20 13:55` All services operational, all traffic normal ## Root cause The root cause of the network disruptions at OVH was determined to be a faulty configuration change related to an upgrade of their networking infrastructure. Further details can be found in their [incident summary](https://network.status-ovhcloud.com/incidents/5mldyhd6v99c).