xMatters incident
Issue Discovered - Service disruption in All Regions – Multiple Services
xMatters experienced a minor incident on October 31, 2023 affecting Web Interface and Web Interface and 1 more component, lasting 24m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 31, 2023, 05:34 PM UTC
xMatters monitoring tools have identified a potential issue with xMatters On-Demand for clients in All Regions. We are currently investigating the issue and will update as information becomes available. Please see incident details for specific services impacted. If you are also experiencing issues, or if you're not sure whether this issue impacts your service, please contact xMatters Client Assistance at https://support.xmatters.com/hc/en-us/requests/new - our support team is waiting to help.
- identified Oct 31, 2023, 05:35 PM UTC
The xMatters Incident Response team has identified the source of the issue and is working on a fix. We will update once a solution has been identified and implemented.
- monitoring Oct 31, 2023, 05:37 PM UTC
The xMatters Incident Response team has deployed a fix for the issue. We are currently monitoring the situation to ensure the implementation is stable and that all services are restored.
- monitoring Oct 31, 2023, 05:38 PM UTC
We are continuing to monitor for any further issues.
- resolved Oct 31, 2023, 05:58 PM UTC
The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.
- postmortem Nov 22, 2023, 11:15 PM UTC
**What happened?** On October 31, 2023, at approximately 10:22 AM Pacific, some customers reported that they were unable to log in to xMatters or, if they were already logged in, encountered "503" errors in the web user interface. Customers may also have noticed some flows failing to execute. **Why did it happen?** xMatters deployed a regularly scheduled update to one of the backend services that comprise the platform. Due to recent hosting changes that included the physical relocation of a data center, the deployment caused a conflict that resulted in a lack of processing availability. **How did we respond?** As soon as customers reported an inability to access the web user interface, the Support team confirmed the issue and initiated the internal major incident process. The response teams quickly identified the root cause and rolled back the deployment to the previous version of the service. This resolved the issue and customers reported that all services were restored. The xMatters Engineering teams then investigated the recent deployment and were able to reconfigure the update and redeploy the service. The service was deployed and restarted without further impact to customers within 20 minutes of resolving the initial issue. **What are we doing to prevent it from happening again?** The xMatters teams regularly deploy updates to backend services and aim for a seamless transition between versions that won't impact customers. To help prevent this type of issue from reoccurring, the teams are adding more process checks to ensure that updates meet backend service requirements and dependencies before customers are switched over to a new version of a service.