xMatters incident

Issue Discovered - Service disruption in Asia Pacific Region - Multiple Services

Minor Resolved View vendor source →

xMatters experienced a minor incident on November 8, 2023 affecting Web Interface and Email Notifications and 1 more component, lasting 19m. The incident has been resolved; the full update timeline is below.

Started
Nov 08, 2023, 07:26 PM UTC
Resolved
Nov 08, 2023, 07:45 PM UTC
Duration
19m
Detected by Pingoru
Nov 08, 2023, 07:26 PM UTC

Affected components

Web InterfaceEmail NotificationsSMS NotificationsVoice NotificationsConferencingIntegration PlatformAPIMobile App

Update timeline

  1. investigating Nov 08, 2023, 07:26 PM UTC

    xMatters monitoring tools have identified a potential issue with xMatters On-Demand for some clients located in the Asia Pacific region. We are currently investigating the issue and will update as information becomes available. Please see incident details for specific services impacted. If you are also experiencing issues, or if you're not sure whether this issue impacts your service, please contact xMatters Client Assistance at https://support.xmatters.com/hc/en-us/requests/new - our support agents are waiting to help.

  2. identified Nov 08, 2023, 07:36 PM UTC

    The xMatters Incident Response team has identified the source of the issue and is working on a fix. We will update once a solution has been identified and implemented.

  3. identified Nov 08, 2023, 07:38 PM UTC

    We are continuing to work on a fix for this issue.

  4. monitoring Nov 08, 2023, 07:38 PM UTC

    The xMatters Incident Response team has deployed a fix for the issue. We are currently monitoring the situation to ensure the implementation is stable and that all services are restored.

  5. resolved Nov 08, 2023, 07:45 PM UTC

    The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.

  6. postmortem Nov 28, 2023, 11:16 PM UTC

    **What happened?** On November 9, 2023, at approximately 5:35 AM AEDT, some customers in the APAC region reported an issue to xMatters Customer Support where they were unable to add a new user. The Add User button was greyed out, and hovering over the button was showing the message "You've reached the maximum number of user licenses for your account" despite having additional licenses available. Some users may also have experienced an intermittent inability to log into the web user interface. Throughout this issue and the subsequent mitigation procedures, the system continued to accept events and generate alerts, and all notifications and responses were processed correctly. **Why did it happen?** During a regularly scheduled update to the backend services in the APAC region, a timing issue caused the service responsible for instance configuration and license tracking to be directed to a version that hadn't received the latest configuration data. This conflict caused the system to calculate allotted licenses incorrectly and caused intermittent login issues. **How did we respond?** The Engineering teams were monitoring the update and were not encountering any warnings or errors within the process that they considered outside acceptable levels for this specific operation. When customers reported the issue to xMatters Customer Support, however, the teams made the decision to roll back the deployment immediately to mitigate any potential problems. As soon as the rollback was completed, customers confirmed that all services had been restored. The Engineering team launched an internal review process and were able to identify some avenues of improvement and successfully redeployed the update without incident. **What are we doing to prevent it from happening again?** In addition to adding additional automated checks to ensure configuration data is always up to date across services prior to an update, the teams isolated the specific cause of the configuration data mismatch to a timeout issue and have updated the timing settings to ensure that it will not happen again.