xMatters incident

Issue Discovered - Service disruption in North America – Multiple Services

Major Resolved View vendor source →

xMatters experienced a major incident on August 6, 2025 affecting Email Notifications and SMS Notifications and 1 more component, lasting 1h 50m. The incident has been resolved; the full update timeline is below.

Started
Aug 06, 2025, 03:27 AM UTC
Resolved
Aug 06, 2025, 05:18 AM UTC
Duration
1h 50m
Detected by Pingoru
Aug 06, 2025, 03:27 AM UTC

Affected components

Email NotificationsSMS NotificationsVoice Notifications

Update timeline

  1. investigating Aug 06, 2025, 03:27 AM UTC

    xMatters monitoring tools have identified a potential issue with xMatters On-Demand for clients in North America that are hosted in the on us-central1 region. We are currently investigating the issue and will update as information becomes available. Please see incident details for specific services impacted. If you are also experiencing issues, or if you're not sure whether this issue impacts your service, please contact xMatters Client Assistance at https://support.xmatters.com/hc/en-us/requests/new - our support team is waiting to help.

  2. identified Aug 06, 2025, 03:44 AM UTC

    The xMatters Incident Response team has identified an issue with our message routing system and is working on a fix. We will update once a solution has been identified and implemented.

  3. monitoring Aug 06, 2025, 03:59 AM UTC

    The xMatters Incident Response team has deployed a fix for the issue. We are currently monitoring the situation to ensure the implementation is stable and that all services are restored.

  4. resolved Aug 06, 2025, 05:18 AM UTC

    The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.

  5. postmortem Aug 14, 2025, 01:16 AM UTC

    ### What happened? On August 5th, 2025, at approximately 7:53 PM Pacific, the xMatters internal monitoring tools alerted the xMatters Support team to an issue in the North America region with notification delivery. Some xMatters customers may have experienced delays when receiving notifications or noticed failed alerts in the system. ### Why did it happen? The issue occurred because of a network problem that disrupted communication between parts of the queuing system. This caused some components to become temporarily out of sync, leading to timeouts, internal connectivity failures, and a small number of messages not being processed during the disruption. As the system recovered, some performance was degraded until normal operation could be restored. ### How did we respond? As soon as the xMatters monitoring tools reported an issue, the xMatters Support Team initiated the internal Major Incident Management process and engaged the Engineering and incident response teams. The teams quickly mitigated the issue and restored performance by redirecting traffic from our message queuing system in the affected region \(us-central\) to our message queuing system in another local region \(us-east\). After the issue was resolved, the teams directed traffic back to the restored region. ### What are we doing to prevent it from happening again? The Engineering team has determined that the best approach to prevent this issue from reoccurring is to replace the current message queuing system. Work on the replacement system is well underway and the teams will retire the current system as soon as they have finished the replacement.