Flexera incident

Flexera One - IT Visibility - All Regions - GraphQL API Errors

Flexera experienced a major incident on November 20, 2025 affecting IT Visibility US and IT Visibility EU and 1 more component, lasting 1h 17m. The incident has been resolved; the full update timeline is below.

Started: Nov 20, 2025, 06:13 PM UTC
Resolved: Nov 20, 2025, 07:30 PM UTC
Duration: 1h 17m
Detected by Pingoru: Nov 20, 2025, 06:13 PM UTC

Affected components

IT Visibility USIT Visibility EUIT Visibility - APAC

Update timeline

investigating Nov 20, 2025, 06:13 PM UTC

Incident Description: We are investigating an issue impacting the IT Visibility GraphQL API across all regions. This is affecting a subset of customers. Affected customers may see 500 “Internal Server Error” responses when running GraphQL queries, integrations, and data extracts. Priority: P2 Restoration Activity: Our technical team has identified the underlying cause and is deploying a fix. We are monitoring closely as we continue to make progress and work toward full restoration. We will continue to provide updates as we make progress.
resolved Nov 20, 2025, 07:30 PM UTC

The fix has been rolled out across APAC, EU, and NAM, and validation is complete. The technical team has confirmed GraphQL queries and related operations, including data extracts, are running successfully, and no further errors are being observed. This incident has been resolved.
postmortem Dec 04, 2025, 11:43 AM UTC

**Description:** Flexera One - IT Visibility - All Regions - GraphQL API Errors **Timeframe:** November 20, 2025 , 6:32 AM PST to November 20, 2025 – 11:19 AM PST ‌ **Incident Summary** ‌ On Thursday, 20 November 2025 at 6:32 AM PST, our technical teams identified an issue impacting the IT Visibility GraphQL API across all regions. A subset of customers encountered failures while executing GraphQL queries, integrations, and data extracts, resulting in 500 “Internal Server Error” responses. During the investigation, our teams identified that prior to a recent production deployment, the GraphQL service had been operating normally. However, shortly after this deployment, telemetry data indicated a spike in error rates, and customers began reporting disruptions in their API-driven workflows. An initial investigation confirmed the issue was prevalent across APAC, EU, and NAM, although the impact was limited to a subset of our customers. The engineering teams swiftly pinpointed the root cause and began deploying a corrective fix. As the deployments progressed region by region, the teams closely monitored GraphQL error rates and validated recovery through telemetry data and customer feedback. By 11:19 AM PST, the fix had been fully implemented, with validation confirming that GraphQL queries, integrations, and data extracts were functioning correctly without any further errors. ‌ **Root Cause** ‌ The incident occurred due to an incomplete update propagation during a production deployment, which resulted in inconsistent data being used for schema generation. The deployment depended on a necessary dataset update that had not been fully promoted to the production environment. Consequently, an outdated field name remained in the dataset, while the application anticipated a newer version of that field. ‌ This discrepancy led to schema generation failures within the GraphQL service, resulting in the 500-level errors experienced by customers. The problem stemmed from a miscommunication that created a misalignment between deployment readiness and data readiness, which caused the GraphQL API to operate with inconsistent schema inputs. ‌ **Remediation Actions** ‌ · **Root Cause Identification** - The technical teams analyzed error patterns and logs, confirming that schema generation was failing due to an outdated field in the dataset. They identified that the dataset update required for the deployment had not been fully promoted. · **Promotion of Required Dataset Update** -Teams promoted the missing dataset update to align the field names with the application schema expectations, resolving the mismatched schema inputs. · **Rollout of Corrective Fix** - A targeted fix was developed and deployed progressively across APAC, EU, and NAM, ensuring minimal disruption. Deployment was monitored in real time to validate error-rate reduction. · **Validation and Customer Confirmation** – Our teams used real-time monitoring to confirm the drop in 500 errors. Customer-facing teams validated recovery by confirming successful GraphQL queries and extract operations with affected customers. · **Extended Monitoring Post-Fix** - Following deployment, all regions were monitored to ensure stability. No further schema generation errors or 500 responses were observed. ‌ **Future Preventative Measures** ‌ * Strengthened Dependency Validation Prior to Deployment – Review and update the SoP documentation to ensure all deployments requiring data or schema updates validate the presence and promotion status of dependent datasets before promotion to production. * Improved Cross-Team Communication - Enhance communication channels and approval workflows between deployment, data, and engineering teams to prevent misaligned go/no-go decisions. * Monitoring review - Review existing monitoring and validation checks to ensure early detection of issues.