xMatters incident

Issue Discovered - Service disruption in North American Region – Web User Interface

Major Resolved View vendor source →

xMatters experienced a major incident on June 21, 2021 affecting Web Interface, lasting 23m. The incident has been resolved; the full update timeline is below.

Started
Jun 21, 2021, 05:22 PM UTC
Resolved
Jun 21, 2021, 05:45 PM UTC
Duration
23m
Detected by Pingoru
Jun 21, 2021, 05:22 PM UTC

Affected components

Web Interface

Update timeline

  1. investigating Jun 21, 2021, 05:22 PM UTC

    xMatters monitoring tools have identified a potential issue with the xMatters Web User Interface for some clients located in the North America region. We are currently investigating the issue and will update as information becomes available. If you are also experiencing issues, or if you're not sure whether this issue impacts your service, please contact xMatters Client Assistance at https://support.xmatters.com/hc/en-us/requests/new - our support agents are waiting to help

  2. identified Jun 21, 2021, 05:32 PM UTC

    The xMatters Incident Response team has identified the source of the issue and is working on a fix. We will update once a solution has been identified and implemented.

  3. monitoring Jun 21, 2021, 05:35 PM UTC

    The xMatters Incident Response team has deployed a fix for the issue. We are currently monitoring the situation to ensure the implementation is stable and that all services are restored.

  4. resolved Jun 21, 2021, 05:45 PM UTC

    The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.

  5. postmortem Jun 25, 2021, 09:27 PM UTC

    ### **What happened?** On June 21, 2021, at approximately 10:05 AM Pacific, the xMatters monitoring tools alerted Customer Support to an issue where the web user interface was unresponsive or exhibiting slow performance. During the incident, some customers may have noticed "Instance Unavailable" errors, or experience longer page loading time when accessing the web user interface. This issue only affected the web user interface; events continued to be accepted and created, and notifications and responses were processed normally. ### **Why did it happen?** This issue was caused by a single instance attempting to load approximately 140,000 user records into memory. This eventually increased memory usage to 100%, resulting in an unresponsive service. While the condition properly triggered an automated restart of the web user interface service, the service was unable to recover properly until the underlying issue could be mitigated. ### **How did we respond?** As soon as Customer Support received the alert from the monitoring tools and confirmed the issue, they initiated a Severity-1 incident and gathered the major incident response team. The team identified the instance responsible for consuming resources and isolated it within a dedicated resource stack to prevent any potential recurrence. The team then manually cleared the cache and restarted the web user interface service, confirming that it had resumed normal operation. ### **What are we doing to prevent it from happening again?** The Engineering team has isolated the source of the memory usage and reconfigured it with dedicated CPU and separate resources to eliminate future incidents of this type. They are currently developing additional memory clean up routines to further improve automated recovery, and investigating how the single instance was able to consume the available memory. Until these improvements are in place, the team will continue to isolate the source of the memory consumption. ### **Timeline:** | **Date/Time \(Pacific\)** | **Action** | | --- | --- | | Monday June 21, 2021 - 10:05 AM | xMatters monitoring alerts to slow or unresponsive customer instances | | 10:17 | Severity-1 Incident initiated | | 10:20 | Source of memory usage identified | | 10:22 | Instance isolated and web UI service restarted | | 10:30 | Web user interface service declared stable | | 10:45 | Incident resolved | If you have any questions, please visit [http://support.xmatters.com](http://support.xmatters.com) No labels