Labrador CMS incident
Major infrastructure network disruptions
Labrador CMS experienced a critical incident on September 20, 2022 affecting Labrador Editor and Labdevs Development and 1 more component, lasting —. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Sep 20, 2022, 11:44 AM UTC
Our infrastructure provider is currently experiencing major network disruptions across all data centres. This affects all Labrador services. We will update as soon as we know more.
- resolved Sep 20, 2022, 12:03 PM UTC
The incident has been resolved by our infrastructure provider, all Labrador services should now be operating normally again. Related: https://network.status-ovhcloud.com/incidents/5mldyhd6v99c
- postmortem Sep 22, 2022, 12:17 PM UTC
## Summary On Tuesday 20.09.2022 between 13:35 - 13:55 CEST our primary infrastructure provider, OVH, experienced major network disruptions across multiple data centers. This resulted in a partial or complete service outage for a large part of the traffic destined to both Labrador CMS and Labrador Front. At 13:55 CEST all Labrador services returned to a healthy state. ## Details Our internal monitoring systems reported the first unavailable services and sites at 13:36 CEST. Initial investigation revealed that a large scale network outage was ongoing, since none of our three data centers were responsive. On-site OVH technicians confirmed networking degradation affecting their data centers, caused by a configuration change related to an upgrade of their networking infrastructure. The faulty configuration was identified and rolled back. Networks returned online at 13:50 CEST, with all Labrador services returning to normal at 13:55 CEST. ## Impacted services Services affected by this incident are specified in the table below. | **Service name** | **Minutes** | **Time from — to** | | --- | --- | --- | | Labrador CMS | 20 | 13:35 — 13:55 | | Labrador Front | 20 | 13:35 — 13:55 | | Labdevs Development | 20 | 13:35 — 13:55 | ## Incident timeline Following is a timeline that describes the entire incident handling process. * `2022.09.20 13:36` Service outage alerts registered * `2022.09.20 13:40` Network outage confirmed by OVH * `2022.09.20 13:46` Fix implemented and pushed by OVH * `2022.09.20 13:50` Services back online, most traffic normal * `2022.09.20 13:55` All services operational, all traffic normal ## Root cause The root cause of the network disruptions at OVH was determined to be a faulty configuration change related to an upgrade of their networking infrastructure. Further details can be found in their [incident summary](https://network.status-ovhcloud.com/incidents/5mldyhd6v99c).