Teem experienced a notice incident on August 20, 2024, lasting 1h 58m. The incident has been resolved; the full update timeline is below.
Update timeline
- investigating Aug 20, 2024, 02:25 PM UTC
We are currently investigating an issue with Teem. We will update you when we have more information.
- investigating Aug 20, 2024, 02:30 PM UTC
We are continuing to investigate this issue.
- resolved Aug 20, 2024, 04:24 PM UTC
We have found this issue to be mitigated and is only effecting a small subset of customers that we are working diligently to support and find a resolution. If you are experiencing issues please don't hesitate to reach out to support.
- postmortem Jan 07, 2025, 07:24 PM UTC
**Teem by Eptura detailed Root Cause Analysis | August 20, 2024** **S2 O365 Multiple Users Marked Inactive** We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident. **Description:** Customers using O365 were unable to access the Teem platform. When trying to login, customers were met with an error message. **Type of Event:** Functionality Issue **Services/Modules impacted:** Production/ Office 365 **Timeline** \(Reported MST\)**:**` ` On the morning of August 20, 2024, at approximately 8:20 AM MST, our support team began receiving reports from end users who were inadvertently marked as Inactive in the Teem platform. We promptly informed all Teem customers of the incident via our Status Page. At 10:24 AM MST, we marked the Status Page as resolved. However, recognizing the importance of addressing this issue thoroughly, our engineering team continued to collaborate closely with our Support team to find a comprehensive solution. To prevent further impact, the engineering team implemented a nightly script to revert the status of users marked as Inactive. The investigation continued until November 1, 2024, when our engineering team successfully released a HotFix to address the issue. Although the initial HotFix did not fully resolve the problem, the team enhanced our logging capabilities to better track and understand the behavior. On November 11, 2024, the additional logs provided valuable insights, enabling our engineering team to resolve the issue in our QA environment. The final HotFix was released on November 14, 2024, and customers have confirmed the resolution. **Total Duration of Event:** 83 Days **Root Cause:** The issue arose from the concurrent processing of multiple user batches. During this process, one thread completed its task earlier than expected and inadvertently deleted all cache keys, including the main Sync\_Key and associated batch\_keys. This led to subsequent threads receiving an empty batch list, which resulted in the deprovisioning or deactivation of users. **Remediation:** To resolve this issue, we have implemented database row-level locking for batch processing. This ensures that batch processing happens sequentially, avoiding conflicts. Key updates include: 1. Introduced a dedicated table to track batch process counts and the ID of the last processed batch. 2. Applied database row-level locks to manage synchronization safely and efficiently. 3. Updated the deprovisioning process to occur only after verifying that all batches are fully processed. **Preventative Action:** To prevent recurrence, we have made the following improvements: * Enhanced concurrency handling to ensure seamless user batch synchronization. * Added extensive logging in CloudWatch to monitor and better understand process behavior. * Streamlined the user synchronization process to ensure that all O365 users are successfully synced with the Teem directory. These enhancements will significantly improve the reliability and performance of our system. Thank you for your patience and support as we continue to make these improvements.