Semperis incident

General

Semperis experienced a notice incident on April 14, 2026, lasting —. The incident has been resolved; the full update timeline is below.

Started: Apr 14, 2026, 06:00 AM UTC
Resolved: Apr 14, 2026, 06:00 AM UTC
Duration: —
Detected by Pingoru: Apr 14, 2026, 06:00 AM UTC

Update timeline

resolved Apr 16, 2026, 07:27 AM UTC

Lightning - Indicator package update triggered a surge of collector requests, causing resource exhaustion and temporary service degradation (EU & NA), resulting in limited access to directory data and IOE results until scaling mitigated the issue.
postmortem Apr 19, 2026, 10:10 AM UTC

**Lightning Intelligence service degradation following routine update** **Postmortem Summary** A routine update triggered an unexpected surge in system requests, resulting in temporary service unavailability for approximately one hour across multiple environments. The issue was identified and mitigated by increasing system capacity. Preventative measures are being implemented to ensure improved resilience and monitoring going forward. **Overview** On April 14, 2026, at approximately 06:00 UTC, a routine update triggered a large number of client components to simultaneously request updated data from the platform. This unexpected spike in traffic overwhelmed a shared service, causing temporary unavailability of the Lightning UI across production environments. The incident impacted all customers in the affected environments for approximately one hour. Service was restored by scaling system capacity and stabilizing request handling. **Leadup** The security indicator update introduced a new version of configuration data. Upon release, all connected components detected the update at the same time and attempted to retrieve it simultaneously. This behavior had not been previously observed at production scale, and existing validation processes did not simulate this level of concurrent activity. **Impact** * **Duration:** ~1 hour * **User impact:** All customers in affected environments * **Customer experience:** * Temporary inability to access directory data in the Lightning UI * Intelligence features \(Dashboards, Exposures, etc.\) were unavailable during the incident * Some background components temporarily failed to update and remained on a previous version until normal operations resumed **Detection** The issue was detected via automated alerting shortly after the spike in system load. Initial detection was limited to a subset of environments, and full impact visibility was established shortly afterward. **Improvement opportunity:** Enhancing system-wide monitoring and alerting will allow faster and more complete detection across all environments. **Response** The engineering team responded immediately upon alert and identified the root cause as a traffic spike overwhelming system capacity. Mitigation actions included increasing available capacity to handle the load. Response time was slightly impacted by access control processes required for emergency changes. **Recovery** Service was restored after increasing system capacity, allowing the platform to handle both background update traffic and customer-facing requests. Once the surge subsided and systems stabilized, full functionality was restored. Total recovery time was approximately one hour. **Root Cause** The incident was caused by a simultaneous update request from a large number of client components following a routine release. This created a sudden spike in demand that exceeded the system’s capacity, impacting both background processing and customer-facing services. **Preventative Measures** The following improvements are being implemented: * **Improved system capacity and scalability** for handling burst traffic * **Enhanced monitoring and alerting** across all environments * **Load testing at production scale scenarios** prior to releases * **Improved resilience of client update mechanisms** * **Clear incident communication processes**, including status page ownership **Timeline** _All times in UTC on April 14, 2026_ * **06:00** – Update released; spike in requests begins * **~06:05** – Automated alert triggered * **~06:10** – Investigation begins * **~06:30** – Capacity increased to mitigate load * **~06:45** – Service begins recovering * **~07:00** – Full service restored; incident resolved