Flexera incident

Spot Elastigroup – AWS – Page Load Delays and Creation Issues

Major Resolved View vendor source →

Flexera experienced a major incident on May 13, 2026 affecting Spot UI and Spot API, lasting 2h 13m. The incident has been resolved; the full update timeline is below.

Started
May 13, 2026, 02:49 PM UTC
Resolved
May 13, 2026, 05:02 PM UTC
Duration
2h 13m
Detected by Pingoru
May 13, 2026, 02:49 PM UTC

Affected components

Spot UISpot API

Update timeline

  1. investigating May 13, 2026, 02:49 PM UTC

    Incident Description: We are currently investigating degraded performance affecting Spot Elastigroup for AWS customers. Impacted customers may experience slower page load times and may be unable to create new Elastigroups. Priority: P2 Restoration Activity: Our technical teams are actively investigating and working to restore normal functionality. Current findings indicate the impact is limited to AWS customers. Validation is ongoing to confirm the full scope of impact.

  2. identified May 13, 2026, 03:45 PM UTC

    We have identified that the issue may be related to a recent change affecting Spot Elastigroup functionality for AWS customers. Our technical teams are actively reverting the change as part of the restoration effort. Initial validation indicates that Elastigroup creation is now working again. We are continuing validation to confirm full recovery, including page load performance and Elastigroup creation behavior.

  3. monitoring May 13, 2026, 04:46 PM UTC

    Our technical teams have completed initial restoration actions, and validation confirms that Elastigroup creation is now working successfully. We are continuing to monitor page load performance and overall functionality before confirming full resolution.

  4. resolved May 13, 2026, 05:02 PM UTC

    Our technical teams have confirmed that all affected functionality has been fully restored. Validation has completed successfully, including Elastigroup creation and page load performance. We are marking this incident as resolved and will continue standard monitoring.

  5. postmortem May 26, 2026, 05:53 PM UTC

    **Description:** Spot Elastigroup – AWS – Page Load Delays and Creation Issues **Timeframe:** May 13, 2026, 7:29 AM PDT to May 13, 2026, 9:38 AM PDT ‌ **Incident Summary** ‌ On Wednesday, our teams identified an issue affecting AWS Elastigroup functionality that impacted Spot Elastigroup for AWS customers, resulting in delays when loading the Elastigroup page and preventing some customers from creating new Elastigroups. The impact was limited to AWS Elastigroup functionality, and validation confirmed that other services were not affected. Technical teams immediately initiated an investigation, reviewing recent platform changes and service behavior. During the investigation, they observed instability in services that correlated with the customer-facing symptoms. In response, teams executed mitigation actions, including reverting a recently introduced change while concurrently validating system behavior and overall service health. Following the rollback, validation confirmed that Elastigroup creation functionality had been successfully restored. Additional performance validation and monitoring confirmed that service behavior had returned to expected levels, with no further customer impact observed. Full functionality was restored by 9:38 AM PDT. ‌ **Root Cause** ‌ The issue was caused by a recently introduced change that resulted in instability within the AWS Elastigroup processing path. This behavior affected service interactions required for Elastigroup page loading and new Elastigroup creation requests, leading to increased latency and request failures for impacted customers. Contributing Factors: * The change unexpectedly affected a broad AWS workload scope, increasing the overall impact radius. * Existing monitoring did not provide early detection for repeated pod restart behavior. * Existing alerting mechanisms did not immediately identify the developing service degradation. ‌ **Remediation Actions** ‌ The following remediation actions were taken to restore service: * Investigated customer-reported Elastigroup page loading delays and creation failures. * Reviewed system behavior and service restart activity impacting the AWS processing path. * Reverted the recently introduced change. * Performed validation of Elastigroup creation workflows through automated testing. * Conducted additional performance validation and environment monitoring to confirm stability. ‌ **Future Preventative Measures** ‌ 1. Improved Deployment Safeguards - Strengthen deployment controls through stricter practices and introduce automatic rollback mechanisms when latency or error thresholds are exceeded. 2. Defined Rollback Strategy – Review and standardize rollback playbook with clearly defined triggers, including error-rate and latency thresholds, to enable faster mitigation during service degradation events. 3. Enhanced Service Health Monitoring - Implement additional alerting and monitoring for continuous pod restart activity to improve early detection and reduce response times for emerging issues.