Lightdash incident

Lightdash Cloud instability (slow response times)

Major Resolved View vendor source →

Lightdash experienced a major incident on May 9, 2023 affecting Lightdash Cloud (US), lasting 1h 44m. The incident has been resolved; the full update timeline is below.

Started
May 09, 2023, 01:48 PM UTC
Resolved
May 09, 2023, 03:33 PM UTC
Duration
1h 44m
Detected by Pingoru
May 09, 2023, 01:48 PM UTC

Affected components

Lightdash Cloud (US)

Update timeline

  1. investigating May 09, 2023, 01:48 PM UTC

    The Lightdash API a https://app.lightdash.cloud is experiencing very slow response times for all users, leading to some actions taking minutes or not executing. We're currently investigating.

  2. monitoring May 09, 2023, 03:22 PM UTC

    We are monitoring performance

  3. resolved May 09, 2023, 03:33 PM UTC

    This incident has been resolved.

  4. postmortem May 09, 2023, 03:34 PM UTC

    Today at 13:37 \(UTC\) we noticed that all API endpoints for [https://app.lightdash.cloud](https://app.lightdash.cloud) were starting to run very slow. This was affecting all users and led to response times of over a minute for the API. Our first response was to greatly increase the amount of resources available for the Lightdash server \(both the number and size of servers\). After this change all services remained stable. We have investigated our logs to understand exactly what actions user’s were taking to make the server so busy. In addition to the volume of users, we noticed a higher number than usual of Databricks users. We already have a fix in testing for improved Databricks performance, which can be expected in the coming hours. For further performance improvements you can follow this milestone in GitHub: [https://github.com/lightdash/lightdash/milestone/91](https://github.com/lightdash/lightdash/milestone/91)