Flexera incident

Flexera One - IT Asset Management - NA - Reconciliation Processing Delays

Major Resolved View vendor source →

Flexera experienced a major incident on April 28, 2026 affecting IT Asset Management - US Batch Processing System, lasting 2h 19m. The incident has been resolved; the full update timeline is below.

Started
Apr 28, 2026, 03:31 PM UTC
Resolved
Apr 28, 2026, 05:50 PM UTC
Duration
2h 19m
Detected by Pingoru
Apr 28, 2026, 03:31 PM UTC

Affected components

IT Asset Management - US Batch Processing System

Update timeline

  1. investigating Apr 28, 2026, 03:31 PM UTC

    Incident Description: We are currently investigating an issue affecting reconciliation processing for Flexera One IT Asset Management in the NA region. While the application remains accessible, affected customers may experience delays in reconciliation completion and related processing activities. Priority: P2 Restoration Activity: Our technical teams are actively engaged and are investigating the cause of the reconciliation processing delays. We are reviewing the affected processing components and working to restore normal processing. Further updates will be provided as more information becomes available.

  2. resolved Apr 28, 2026, 05:50 PM UTC

    Our technical teams identified that the reconciliation processing delays were caused by a deployment that occurred while reconciliation tasks were already running. This caused some affected reconciliation jobs to enter a repeated retry state instead of completing as expected, which delayed processing for impacted customers. The issue has been addressed, and newly started reconciliation jobs are completing successfully. Jobs that were stopped as part of recovery will run again based on their normal schedule. We will continue monitoring processing health and are tracking longer-term improvements to reduce the risk of a similar issue occurring again.

  3. postmortem May 12, 2026, 05:42 AM UTC

    **Description:** Flexera One – IT Asset Management – NA – Reconciliation Processing Delays **Timeframe:** April 28, 2026, 6:10 AM PDT – April 28, 2026, 10:45 AM PDT **Incident Summary** On April 28, 2026, at approximately 6:10 AM PDT, an issue was identified affecting reconciliation processing within the Flexera One IT Asset Management service in the North America production environment. During this period, the application remained accessible. However, affected customers may have experienced delays in reconciliation completion and related writer job processing. Investigation identified that some writer jobs were blocked while processing reconciliation results, with repeated failures observed during result streaming. Technical teams began investigating immediately and reviewed logs associated with the affected processing flow. The investigation identified repeated attempts to retrieve reconciliation results after streaming errors occurred, resulting in a large volume of repeated errors. Several writer jobs were also observed running longer than expected. As part of the recovery effort, a service restart or redeploy was attempted but did not resolve the issue. Technical teams then identified that a recent service release occurred around the time processing load began increasing. The service was reverted to a previous version as part of the mitigation effort, and further deployment activity was held while the issue was reviewed. Additional recovery actions were taken for writer jobs that were already stuck in an unrecoverable state. These jobs were cleared where needed, while newly started writer jobs were monitored to confirm that they were processing successfully. By April 28, 2026, at approximately 10:45 AM PDT, current writer job processing was confirmed to be completing successfully, and the incident was considered resolved. Any longer-running jobs not related to the affected processing stage continued to be monitored separately. **Root Cause** The incident was caused by writer jobs entering a blocked state while processing reconciliation in the North America production environment. This resulted in repeated failures during result streaming and delayed completion of reconciliation-related writer job processing for affected customers. The issue occurred around the time of a service release, and technical teams identified that a deployment during active reconciliation processing may cause certain writer jobs to enter an unrecoverable error state. When this occurred, retry behavior from the IT Asset Management processing flow continued until the affected jobs timed out or were manually cleared. **Remediation Actions** The following actions were taken during the incident response: 1. Incident Investigation Initiated: Technical teams began investigating delays affecting reconciliation completion and related writer job processing in the North America region. 2. Log Review Completed: Logs were reviewed and showed repeated failures during reconciliation result streaming, including repeated attempts to retrieve results after errors occurred. 3. Blocked Writer Jobs Identified: Technical teams identified that some writer jobs were blocked during reconciliation result processing and had been running longer than expected. 4. Service Restart/Redeploy Attempted: A restart or redeploy of the affected service was attempted but did not resolve the issue, as affected writer jobs had already entered a blocked processing state. 5. Recent Service Release Reverted: A recent service release that occurred around the time processing load increased was reverted to a previous version as part of the mitigation effort. 6. Further Deployment Activity Held: Additional deployment activity was paused while technical teams reviewed the behavior and assessed the safest recovery approach. 7. Stuck Writer Jobs Cleared: Writer jobs that were confirmed to be stuck in an unrecoverable error state were cleared where needed. 8. Processing Recovery Validated: Newly started and current writer jobs were monitored and confirmed to be completing successfully before the incident was resolved. **Future Preventative Measures** This incident highlighted the importance of improving resilience in reconciliation processing when service changes occur during active writer job processing. Based on the investigation, the following follow-up activities are being pursued: 1. Deployment Strategy Review: Review deployment approaches for the affected service to reduce the likelihood of disrupting active reconciliation processing. 2. Retry Behavior Improvements: Review retry behavior for reconciliation-related writer jobs to reduce repeated processing attempts when a job reaches an unrecoverable error condition. 3. Failure Handling Improvements: Evaluate code-side improvements to better detect specific failure patterns and allow affected processing to fail or recover more cleanly.