Xink incident

EU sync service issue

Major Resolved View vendor source →

Xink experienced a major incident on February 26, 2024 affecting Outlook Add-in, lasting —. The incident has been resolved; the full update timeline is below.

Started
Feb 26, 2024, 11:33 AM UTC
Resolved
Feb 26, 2024, 11:33 AM UTC
Duration
Detected by Pingoru
Feb 26, 2024, 11:33 AM UTC

Affected components

Outlook Add-in

Update timeline

  1. investigating Feb 26, 2024, 10:04 AM UTC

    A subset of EU clients reported an issue with their add-in and client Xink app, receiving an error while composing email. We are currently investigation this issue.

  2. resolved Feb 26, 2024, 11:33 AM UTC

    This incident has been resolved. RCA will be provide in Postmortem once investigation complete.

  3. postmortem Feb 26, 2024, 03:02 PM UTC

    # RCA: Regex Timeout Issue \(.NET\) in Sync Service **Summary:** Xink uses a .NET regex component for pattern search in signature templates. A high performant compiled regex have a safety timeout parameter that is used to protect from inefficient expressions or malformed data. The problem stemmed from the .NET framework's behavior, specifically highlighted in issue #54747 on [GitHub](https://github.com/dotnet/runtime/issues/54747). Despite the issue being closed without a resolution, it persisted and caused overload problems for our sync service. ‌ **Impact:** The regex timeout issue resulted in sync service overload, leading to performance degradation and service disruptions for our clients. This impacted the reliability and responsiveness of our signature template sync service, affecting user experience and productivity. ‌ **Mitigation:** To address the issue and prevent future occurrences, we quickly took proactive measures to reduce the default timeout parameter significantly. This adjustment helped resolve the overload problem permanently, restoring system stability and performance. ‌ **Conclusion:** The regex timeout issue presented a significant challenge for our system, impacting performance and user experience. Through proactive mitigation measures and adaptive solutions, we were able to address the issue effectively and restore system stability. Moving forward, we remain committed to continuously monitoring and optimizing system performance to deliver a reliable and responsive user experience.