Flexera incident

Snow Software - Australia - Specific pages not loading

Flexera experienced a major incident on April 21, 2026 affecting Snow Atlas - Australia and Snow Atlas API - Australia, lasting 21m. The incident has been resolved; the full update timeline is below.

Started: Apr 21, 2026, 09:01 AM UTC
Resolved: Apr 21, 2026, 09:23 AM UTC
Duration: 21m
Detected by Pingoru: Apr 21, 2026, 09:01 AM UTC

Affected components

Snow Atlas - AustraliaSnow Atlas API - Australia

Update timeline

investigating Apr 21, 2026, 09:01 AM UTC

Incident Description: We are currently experiencing an issue affecting SAM Core on Snow Software in the Australia region. Affected users may encounter errors or experience difficulties when attempting to load certain pages within the application. Priority: P2 Restoration Activity: Our technical teams are actively investigating the issue. Initial analysis indicates a potential problem within the messaging system that may be contributing to the observed behavior. Further updates will be provided as more information becomes available.
resolved Apr 21, 2026, 09:23 AM UTC

Our teams identified the issue as being caused by a recent deployment. The change has been rolled back, and services have been successfully restored to normal operation.
postmortem May 05, 2026, 06:29 AM UTC

**Description:** Snow Software - Australia - SAM Core \(Snow Atlas\) Errors **Timeframe:** April 21, 2026, 1:35 AM PST to April 21, 2026, 2:05 AM PST ‌ **Incident Summary** ‌ On Tuesday, April 21, 2026, 1:35 AM PST ,customers in the Australia region began experiencing HTTP 500 errors when accessing specific SAM Core functionality, particularly within the Application and License detail views. Although the Snow Atlas portal itself remained accessible, certain features failed to load, disrupting normal user operations. Initial investigation indicated a potential issue in the messaging system, supported by logs from the API service that reported a routing error. As a mitigation step, the route application in the Australia region was restarted at 1:38 AM PST, which led to a temporary improvement in system behaviour. The issue, however, reoccurred shortly thereafter, confirming that the restart did not resolve the underlying problem. Subsequent investigation linked the incident to a recent change associated with regional migration activities between Australia Southeast and Australia East. The team determined this change was a likely contributing factor and initiated a rollback of the prior day’s deployment. Following the rollback, system stability improved, and by 2:05 AM PST, validation with multiple previously impacted customers confirmed that the affected pages were once again functioning as expected. Full service was restored, and the incident was considered resolved. ‌ **Root Cause** ‌ The issue was caused by an unintended side effect of recent changes introduced during the Australia Southeast to Australia East migration, which resulted in a routing failure within the application layer. Contributing Factors: * A recent deployment related to migration cleanup activities introduced instability in routing behavior. * A fatal routing error in the API service disrupted request handling. * Potential interaction with the messaging system contributed to service degradation \(under investigation\). * The issue was not immediately reproducible during initial validation after the change, delaying detection. ‌ **Remediation Actions** ‌ The following remediation steps were implemented to restore service functionality: * Restarted the routing service to attempt initial recovery. * Conducted detailed log analysis to identify routing failures. * Rolled back the prior day’s deployment associated with migration changes. * Validated service recovery with affected customers. ‌ **Future Preventative Measures** ‌ * Enhance validation and monitoring for regional migration-related changes. * Implement additional safeguards and automated checks for routing and messaging dependencies. * Introduce stricter post-deployment verification processes to detect delayed failures. * Conduct a detailed post-mortem to identify any additional contributing conditions and ensure long-term stability.