Litium incident

Serverless Cloud Site Disruptions

Litium experienced a minor incident on August 13, 2025 affecting App Services, lasting 1d 23h. The incident has been resolved; the full update timeline is below.

Started: Aug 13, 2025, 08:25 AM UTC
Resolved: Aug 15, 2025, 07:54 AM UTC
Duration: 1d 23h
Detected by Pingoru: Aug 13, 2025, 08:25 AM UTC

Affected components

App Services

Update timeline

investigating Aug 13, 2025, 08:25 AM UTC

We are currently experiencing issues affecting some requests in our serverless cloud environment. Our team is actively investigating the cause and working to resolve the problem. We will provide an update as soon as more information is available.
monitoring Aug 13, 2025, 05:13 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Aug 15, 2025, 07:54 AM UTC

This incident has been resolved.
postmortem Aug 15, 2025, 07:55 AM UTC

**Incident Summary – August 12–15, 2025** On August 12 at approximately 20:00 CEST, an infrastructure issue temporarily impacted a small number of customers. Of those affected, only a small fraction of requests experienced request timeouts \(where the system did not respond within the expected time\). No widespread service disruption occurred. Following the initial recovery, some customers experienced intermittent request timeouts on August 13. Targeted restarts were performed during the day, including two specific resources showing abnormal behavior, after which stability improved and error rates returned to normal. The incident remains open in monitoring until the morning of August 15 to confirm sustained normal operation before closure. We appreciate your patience and understanding. **Timeline** * **Aug 12, ~20:00** – An unexpected hardware event caused certain services to restart, leading to partial unavailability for some components. Services were automatically restarted and restored. * **Aug 13, morning** – Some customers experienced slightly increased response times and a small number of intermittent request timeouts related to the previous evening’s event. * **Aug 13, 11:00** – All services returned to normal redundancy and performance, though sporadic timeouts persisted for a subset of customers. * **Aug 13, 16:30–18:30** – Additional service restarts were performed to eliminate remaining issues. At this point, all systems were functioning as expected. * **Aug 14, 16:45–17:30** – Additional preventive maintenance was performed on multiple resources to help ensure continued stability. * **Aug 14 evening** – Services remain stable and are kept in monitoring to confirm sustained normal operation. **Impact** * Affected only a small number of customers. * For those affected, a small fraction of requests resulted in timeouts. **Resolution** The service team monitored and performed targeted restarts of affected components. Error rates returned to normal on the evening of August 13. The incident remained open in monitoring until the morning of August 15 to ensure continued stability before closure. **Next steps** We have updated our operational routines, enhanced monitoring, and refined recovery procedures to further reduce the impact of similar incidents occurring in the future.