Lokalise incident

Lokalise platform performance issues

Critical Resolved View vendor source →

Lokalise experienced a critical incident on April 19, 2023 affecting Lokalise API and Lokalise App, lasting 5h 56m. The incident has been resolved; the full update timeline is below.

Started
Apr 19, 2023, 04:20 AM UTC
Resolved
Apr 19, 2023, 10:17 AM UTC
Duration
5h 56m
Detected by Pingoru
Apr 19, 2023, 04:20 AM UTC

Affected components

Lokalise APILokalise App

Update timeline

  1. investigating Apr 19, 2023, 04:20 AM UTC

    The Lokalise platform (https://app.lokalise.com/) is experiencing performance issues related to loading times, navigation speed within projects, and search functionality. API is also affected. Our team is investigating the root cause of these issues and working diligently to resolve them immediately.

  2. investigating Apr 19, 2023, 06:07 AM UTC

    We are continuing to investigate this issue.

  3. investigating Apr 19, 2023, 06:26 AM UTC

    We are continuing to investigate this issue.

  4. investigating Apr 19, 2023, 06:37 AM UTC

    We are continuing to investigate this issue.

  5. identified Apr 19, 2023, 06:52 AM UTC

    The issue has been identified, and a fix is being implemented. The app is accessible now, however we disabled search, filtering, and statistics functionality to work on the root cause.

  6. monitoring Apr 19, 2023, 08:29 AM UTC

    A fix has been implemented and the search, filtering, and statistics functionality are restored. We are monitoring the results.

  7. resolved Apr 19, 2023, 10:17 AM UTC

    This incident has been resolved.

  8. postmortem Apr 25, 2023, 02:55 PM UTC

    To better handle the growing amount of data and users in Lokalise, we constantly work on scaling resources for the application. After a routine operational change that was previously executed multiple times and tested on staging environment successfully, Elasticsearch cluster that powers many Lokalise features has become suddenly overloaded. Once more users have started coming online the service began struggling with the load leading to increased latency and general slowness of Lokalise application. We have turned off filters, search, and statistics to make application performant again while in limited mode, and continued to work on resolving the issue. The source of the issue has been established quickly, however full performance restoration took more than an hour before we could re-enable all functionality. It took this long because the Elasticsearch index that had to be relocated was very large. The root cause of the incident was an incorrect estimation of the resources required for scaling the backend service. This was unexpected as it turned out that metrics we have in place did not reveal the full extent of the actual service’s load. We apologize for the inconvenience and frustration caused by the downtime experienced by our customers. Our team takes this incident seriously and is committed to taking all necessary measures to prevent similar incidents from occurring in the future. We appreciate your patience and understanding and will continue to work diligently to improve our system’s performance and reliability for you.