Lokalise incident

Major service outage

Lokalise experienced a critical incident on February 10, 2025 affecting Lokalise API and Lokalise App and 1 more component, lasting 51m. The incident has been resolved; the full update timeline is below.

Started: Feb 10, 2025, 12:38 PM UTC
Resolved: Feb 10, 2025, 01:29 PM UTC
Duration: 51m
Detected by Pingoru: Feb 10, 2025, 12:38 PM UTC

Affected components

Lokalise APILokalise AppLokalise OTA

Update timeline

investigating Feb 10, 2025, 12:38 PM UTC

We are currently investigating this issue.
investigating Feb 10, 2025, 12:40 PM UTC

We are continuing to investigate this issue.
monitoring Feb 10, 2025, 12:50 PM UTC

A fix has been implemented and we are monitoring the results.
monitoring Feb 10, 2025, 12:52 PM UTC

We are continuing to monitor for any further issues.
monitoring Feb 10, 2025, 01:04 PM UTC

We are investigating the slowness of application and Lokalise OTA service unavailable.
monitoring Feb 10, 2025, 01:13 PM UTC

We are investigating the slowness of application and Lokalise OTA service unavailable.
monitoring Feb 10, 2025, 01:20 PM UTC

We have applied the fix and monitoring for issues.
resolved Feb 10, 2025, 01:29 PM UTC

This incident has been resolved.
postmortem Feb 14, 2025, 08:59 AM UTC

On February 10, 2025, our service experienced a 16-minute outage followed by 39 minutes of degraded functionality due to an issue with a system configuration. **What happened?** While improving our disaster recovery process, a misconfiguration in the system was introduced unintentionally. Initially, this did not cause issues, but when we attempted to roll back the change, it led to unexpected complications. As a result, our service platform became temporarily unavailable, requiring the reconstruction of certain system components to restore full functionality. **Impact** * **12:30 – 12:46 UTC**: Service outage. * **12:46 – 13:25 UTC**: Service degradation—APP and API were operational. Some services remained unavailable \(OTA, Workflows, Connectors, Review Center\). **What we are doing to prevent this in the future** We recognize the importance of this event, and we have taken steps to ensure it does not happen again. Our key actions include: * Improving validation and monitoring processes to identify configuration issues before deployment. * Enhancing our tools and automations for faster services restoration. * Implementing blue-green deployment techniques, or equivalent, for seamless system upgrades. We sincerely apologize for the disruption this caused and appreciate your patience as we work to make our systems more resilient. If you have any questions, please reach out to [[email protected]](mailto:[email protected]).