Flexera incident

Flexera One- IT Visibility- US Prod - Degraded performance

Flexera experienced a major incident on May 11, 2026 affecting IT Visibility US, lasting 4d 1h. The incident has been resolved; the full update timeline is below.

Started: May 11, 2026, 03:31 AM UTC
Resolved: May 15, 2026, 04:42 AM UTC
Duration: 4d 1h
Detected by Pingoru: May 11, 2026, 03:31 AM UTC

Affected components

IT Visibility US

Update timeline

investigating May 11, 2026, 03:31 AM UTC

Incident Description: We are investigating an issue impacting IT Visibility (ITV) services in the US Production environment. Affected customers may experience delays in normalized inventory updates appearing in ITV . Priority: P2 Restoration Activity: Our technical teams have identified that a backend service is experiencing elevated request throttling from an upstream cloud provider dependency, resulting in delayed inventory processing. Teams are actively investigating the issue and working to restore normal processing. We are monitoring the environment closely and will continue to provide updates as progress is made toward full restoration.
monitoring May 11, 2026, 05:06 AM UTC

Our technical teams have successfully designed and implemented a fix. Services are now operating normally, and backlog processing is underway for the data accumulated during the incident.
monitoring May 11, 2026, 11:16 AM UTC

Our technical teams continue to work through the backlog recovery process. Recovery activities are being carried out cautiously to avoid overwhelming downstream services and to ensure platform stability throughout the restoration process.
monitoring May 11, 2026, 04:58 PM UTC

Backlog processing continues to make steady progress following the previously deployed fix. The platform remains stable, and recovery activities are being carefully managed to ensure delayed inventory data is processed safely without impacting downstream services. Customers may continue to experience delays in updated inventory data appearing during this recovery period. We will continue to monitor progress and provide further updates as appropriate.
monitoring May 12, 2026, 02:39 PM UTC

Backlog processing remains in progress, and the platform continues to operate in a stable state. Recovery activities are being carefully managed to ensure delayed inventory data is processed safely while maintaining overall service stability. We will continue to monitor progress and provide additional updates as recovery continues.
monitoring May 13, 2026, 12:45 AM UTC

Backlog processing continues, and the platform remains stable. The system is keeping up with incoming data while previously delayed inventory data is being processed in a controlled manner to ensure overall service stability. We will continue to monitor progress and provide additional updates as recovery continues.
monitoring May 13, 2026, 10:21 AM UTC

Our technical teams continue to closely monitor recovery progress. The system is steadily processing the existing backlog in a controlled manner, with priority placed on maintaining platform stability. We will continue to provide updates as recovery progresses or if there are any material changes.
monitoring May 13, 2026, 04:07 PM UTC

Most data is now being processed successfully, and the platform continues to operate in a stable state while keeping up with incoming data. A small amount of remaining backlog is still being worked through, with recovery activities managed carefully to maintain overall service stability. We will continue to monitor progress and provide additional updates as recovery completes.
monitoring May 14, 2026, 06:54 AM UTC

Processing continues following an overnight service restart that required a portion of data to be replayed. Most organizations are now processing within approximately one hour of real time, and the remaining backlog continues to reduce steadily. Our teams are actively monitoring the environment to ensure full recovery and continued platform stability.
monitoring May 15, 2026, 01:30 AM UTC

Most data processing has completed, and the platform continues to operate in a stable state while keeping up with incoming data. Our teams are performing final validations to confirm everything is functioning as expected. We will provide a final update once these validations are complete.
resolved May 15, 2026, 04:42 AM UTC

All backlog processing has been completed and services remain stable. This issue has now been resolved. Our teams will continue to monitor the platform to ensure ongoing stability. Further details will be shared in a post mortem report.
postmortem May 29, 2026, 10:46 AM UTC

**Description:** Flexera One- IT Visibility- US Prod - Degraded performance **Timeframe:** May 10, 2026, 8:12 PM PDT to May 14, 2026, 07:25 PM PDT ‌ **Incident Summary** ‌ On Sunday, May 10 , 2026, at 8:12 PM PDT , our teams detected an issue impacting IT Visibility \(ITV\) services in the US region where the affected customers experienced delays in normalized inventory processing and delivery to ITV UIs and APIs. The issue originated within the normalization persistence layer, where the service encountered repeated failures while initializing streaming clients used for communication. During initialization, the service generated a large number of API requests in a short period of time, which exceeded account throttling limits. As a result, the service repeatedly restarted and was unable to consistently process and persist normalized inventory data. Our technical teams identified the contributing factors, implemented mitigation measures, and deployed code improvements designed to reduce API request spikes and improve service resiliency during startup and recovery conditions. Following deployment of the fixes, services stabilized and backlog processing was initiated in a controlled manner to avoid downstream system impact. Recovery progressed steadily, and all backlog processing was successfully completed by May 14, 2026, at 07:25 PM PDT. ‌ **Root Cause** ‌ * The incident was caused by excessive API requests generated during streaming client initialization within the normalization writer service. The request volume exceeded throttling limits, preventing successful initialization and causing the service to repeatedly restart. Contributing Factors: * The production US environment had significantly scaled up, increasing the number of streaming clients initialized during service startup. * Separate streaming clients were created per organization across multiple collections, resulting in unexpectedly high API calls during each pod restart. * Failure handling logic caused the service to terminate immediately instead of retrying gracefully, amplifying restart frequency and additional request spikes. ‌ **Remediation Actions** ‌ The following remediation activities were completed to restore service stability: * Implemented staggered streaming client initialization to reduce API request spikes during service startup. * Added retry logic with exponential backoff around streaming client creation and authentication requests. * Replaced failure handling with graceful recovery and retry mechanisms. * Stabilized normalization services and resumed backlog processing in a controlled manner. * Closely monitored recovery activities to ensure downstream platform stability during backlog replay. ‌ **Future Preventative Measures** * Introduce rate limiting controls for external dependency initialization workflows. * Enhance resiliency standards for service startup and dependency authentication handling. * Improve observability and alerting around throttling conditions. * Review scalability assumptions and startup behavior for high-scale growth scenarios. * Conduct additional resiliency testing focused on restart and dependency throttling conditions.