Workspot incident

Workspot Control is currently not accessible

Major Resolved View vendor source →

Workspot experienced a major incident on February 24, 2022 affecting Workspot Control, lasting 52m. The incident has been resolved; the full update timeline is below.

Started
Feb 24, 2022, 04:46 PM UTC
Resolved
Feb 24, 2022, 05:38 PM UTC
Duration
52m
Detected by Pingoru
Feb 24, 2022, 04:46 PM UTC

Affected components

Workspot Control

Update timeline

  1. investigating Feb 24, 2022, 04:46 PM UTC

    Workspot is currently aware there is an issue accessing Control. We are actively investigating the issue. administrators will not be able to login to Control and end users will not be able to launch new non-persistent desktop sessions. Existing end user sessions will not be impacted, and end users will be able to launch persistent desktops.

  2. identified Feb 24, 2022, 05:06 PM UTC

    Workspot's Paas Provider has acknowledged it as a network connectivity issue to the Apps and their engineers are investigating. We will continue to work with them and update as necessary. Thank you for your patience.

  3. monitoring Feb 24, 2022, 05:14 PM UTC

    Workspot Control Service is back online. Our Paas Provider has restored access to their Apps. We are continuing to monitor but all services are currently restored within Control. We will provide a full RCA when we receive it. Thank you.

  4. resolved Feb 24, 2022, 05:38 PM UTC

    There have been no further issues with our Paas Provider. We are resolving this issue and will provide an RCA when we receive it.

  5. postmortem Mar 02, 2022, 10:09 PM UTC

    The RCA from our PaaS Provider: _On February 24th, 2022 between 16:25 UTC and 17:35 UTC, our customers experienced a network outage as a result of an invalid DNS configuration change to our infrastructure._ _A configuration update inadvertently changed a crucial set of DNS records, causing an outage for the Common Runtime's ingress path, which manifested for customer applications in the form of timeouts._ _Our team of on-call engineers quickly identified the root cause and applied a configuration update which quickly resolved the problem. After correcting the DNS configuration, the platform took some time to recover capacity as previously scaled down resources scaled back up to meet demand. and eventually fully recovered._ _## What will we do to mitigate problems like this in the future?_ _Engineering updated the infrastructure-managing code so these unintended DNS changes can't happen again._