Lightdash incident

app.lightdash.cloud showing degraded performance

Minor Resolved View vendor source →

Lightdash experienced a minor incident on January 26, 2023 affecting Lightdash Cloud (US), lasting 3h 37m. The incident has been resolved; the full update timeline is below.

Started
Jan 26, 2023, 04:49 PM UTC
Resolved
Jan 26, 2023, 08:26 PM UTC
Duration
3h 37m
Detected by Pingoru
Jan 26, 2023, 04:49 PM UTC

Affected components

Lightdash Cloud (US)

Update timeline

  1. investigating Jan 26, 2023, 04:49 PM UTC

    We are currently investigating the root cause of the issue

  2. investigating Jan 26, 2023, 07:03 PM UTC

    We have identified the failing component but are still finding the root cause. Services appear operational again.

  3. identified Jan 26, 2023, 07:49 PM UTC

    We've identified the root cause and it's being fixed.

  4. monitoring Jan 26, 2023, 07:58 PM UTC

    A fix has been implemented and we are monitoring the deployment.

  5. resolved Jan 26, 2023, 08:26 PM UTC

    This incident is resolved.

  6. postmortem Jan 26, 2023, 08:27 PM UTC

    Earlier today we received an automated alert that app.lightdash.cloud was unavailable and returning 502 errors. The reason for this error was that Lightdash was slowing down due to the amount of usage in Lightdash Cloud at the time. The slower response times in Lightdash triggered an automated process to restart the Lightdash servers, usually this should only trigger in the case that the server has already crashed. In this incident, this was a mistake and the server was simply running more slowly than expected. To resolve the issue, we've added much more resource to our Lightdash Cloud servers to prevent slow response times. We've also increased the threshold to automatically restarting the servers in the case of very slow response times.