xMatters incident
Issue Discovered - Service disruption in North America
xMatters experienced a minor incident on December 28, 2018 affecting Web Interface and Email Notifications and 1 more component, lasting 1d 2h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Dec 28, 2018, 05:17 AM UTC
xMatters have received several reports today of users not being able to access the web user interface. The root cause of this issue is related to a wide impact service outage experienced by a primary internet service provider (ISP) in North America. xMatters services are running and operational, however some users may not be able to access their xMatters instance based on their geographic location. We continue to monitor the situation closely and will provide updates as they become available. If you are also experiencing issues, or if you're not sure whether this issue impacts your service, please contact xMatters Client Assistance at https://support.xmatters.com/hc/en-us/requests/new - our support agents are waiting to help.
- identified Dec 28, 2018, 06:20 AM UTC
As mentioned previously, this issue has been identified to be a widespread issue impacting a primary ISP in North America. We continue to monitor the situation and will provide another update as it becomes available.
- identified Dec 28, 2018, 05:19 PM UTC
The ISP have made some progress, but are still working on fully restoring their service. Some users will continue to see issues accessing the web user interface depending on the geographic location. We continue to monitor the situation and will provide updates as we get them.
- monitoring Dec 29, 2018, 02:16 AM UTC
The ISP have confirmed that most networking issues they were experiencing should now be resolved. We are currently monitoring the situation to ensure the implementation is stable and that all services are restored.
- resolved Dec 29, 2018, 07:44 AM UTC
The issue has been addressed by the ISP and network services have been restored. Thank you for your patience while this issue was being addressed.
- postmortem Jan 03, 2019, 04:58 PM UTC
### What happened? On Thursday, December 27, 2018 at approximately 8:41 AM PST, the xMatters networking monitoring systems alerted Client Assistance to an issue with xMatters On-Demand services for some clients located North America. During the issue, some clients may have experienced intermittent access to the xMatters user interface or a delay when injecting events into xMatters. In addition, some clients may have experienced intermittent delays or interruptions with the delivery and reception of xMatters emails. ### Why did it happen? The root cause of this issue was a high-impact service outage experienced by a primary internet service provider \(ISP\) in North America. This wide-reaching ISP outage impacted connectivity, email service, and Internet access across North America and even parts of Europe, and caused some issues common to large ISP outages, such as DNS gaps and mobile app connectivity problems. Throughout the incident, the xMatters web user interface was operating and functional, event injection methods were working properly, and non-email notifications and responses were being sent and processed normally. Most clients may have experienced increased latency during the event that affected the overall user experience. ### How did we respond? As soon as the xMatters network monitoring tools detected connectivity issues, the xMatters Client Assistance and Engineering teams escalated the issue to Severity 1 and initiated the internal major incident management process. While the incident response teams began simultaneously investigating the underlying cause. Client Assistance identified and informed affected clients about the incident. The teams immediately identified that the issue was limited to a specific data center within the North American region and determined that the problem was due to a widespread ISP outage in North America. The team connected with the ISP and began working in collaboration with them to determine the impact to xMatters customers, and rerouted email services through an unaffected path. During the event, all in-flight deployments and upgrades were paused until network access was fully restored to avoid the possibility of impact. Our incident management team continued to monitor the situation closely and update clients as the ISP reported on their restoration progress. ### What are we doing to prevent it from happening again? xMatters uses multiple network backbones and automatically routes traffic across other networks and through other data centers in the event of an Internet failure. During this event, these systems were working as designed and connectivity was reestablished within the expected period of re-convergence. As part of our commitment to continuous improvement, we are conducting hosting service improvements to our infrastructure-as-a-service, scheduled to occur in the North American region in January 2019. These improvements will greatly reduce the potential impact of ISP outages. For more information, see the article on our support site: [https://support.xmatters.com/hc/en-us/articles/115005269506](https://support.xmatters.com/hc/en-us/articles/115005269506). ### Timeline: December 27, 2018 - 8:41 AM - xMatters internal monitoring alerts Client Assistance to issue in North America 8:43 AM - Client Assistance confirms all services are accessible and operational 8:58 AM - Client Assistance escalates issue to Severity 1; incident response teams begin investigation 9:03 AM - Team confirms issue with ISP 9:28 AM - xMatters engages ISP and obtains point of contact 5:46 PM - Issues identified with email service and delivery 6:04 PM - Email traffic re-routed to alternate path 6:07 PM - Email services restored 9:22 PM - ISP provides 4-hour ETA for resolution December 28 2018 - 9:19 AM - ISP indicates progress and claims to be nearing resolution 6:16 PM - ISP indicates that a solution has been implemented; currently monitoring connection for stability 11:44 PM - xMatters confirms all services restored If you have any questions, please visit [http://support.xmatters.com](http://support.xmatters.com)