xMatters incident
Issue Discovered - Service disruption in European Region - Multiple Services
xMatters experienced a minor incident on May 13, 2025 affecting Web Interface and Email Notifications and 1 more component, lasting 3h 1m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating May 13, 2025, 05:45 PM UTC
xMatters monitoring tools have identified a potential issue with xMatters On-Demand for some clients located in the Europe region. We are currently investigating the issue and will update as information becomes available. Please see incident details for specific services impacted. If you are also experiencing issues, or if you're not sure whether this issue impacts your service, please contact xMatters Client Assistance at https://support.xmatters.com/hc/en-us/requests/new - our support agents are waiting to help.
- identified May 13, 2025, 05:53 PM UTC
The xMatters Incident Response team has identified the source of the issue and is working on a fix. We will update once a solution has been identified and implemented.
- identified May 13, 2025, 05:54 PM UTC
We are continuing to work on a fix for this issue.
- monitoring May 13, 2025, 06:03 PM UTC
The xMatters Incident Response team has deployed a fix for the issue. We are currently monitoring the situation to ensure the implementation is stable and that all services are restored.
- resolved May 13, 2025, 08:46 PM UTC
The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.
- postmortem May 27, 2025, 08:50 PM UTC
**What happened?** On May 13, 2025 at approximately 10:45 AM Pacific, xMatters internal monitoring tools identified an issue where customers in the EU region experienced intermittent web UI and API timeouts. **Why did it happen?** The issue occurred because a backend queueing service experienced network timeouts during an unpredictable rapid increase in usage. The increase in resource consumption due to the surge in network usage caused service timeouts and restarts, as well as higher latency which caused further delays in responses to backend requests. **How did we respond?** xMatters internal monitoring tools alerted the xMatters Incident Response Team to the issue, then the team launched the internal SEV-1 process. Due to early detection, Engineering teams were able to scale up the queueing services to prevent further service degradation and availability issues. The network timeouts were resolved after resources were scaled up to accommodate the increase in usage. **What are we doing to prevent it from happening again?** The Engineering teams have adjusted resources to better compensate for sudden usage increases and to prevent them from affecting backend services. The improvement in resource allocation and adaptability should prevent similar issues from occurring in the future.