Mindtickle incident

Failures observed in bulk operations and invitation workflows on Mindtickle Admin site

Major Resolved View vendor source →

Mindtickle experienced a major incident on December 10, 2024 affecting Course / Quick-Update / Assessment and Mission and 1 more component, lasting 38m. The incident has been resolved; the full update timeline is below.

Started
Dec 10, 2024, 03:06 PM UTC
Resolved
Dec 10, 2024, 03:44 PM UTC
Duration
38m
Detected by Pingoru
Dec 10, 2024, 03:06 PM UTC

Affected components

Course / Quick-Update / AssessmentMissionAdmin SiteCoaching SessionsInstructor-Led TrainingSpaced Reinforcement

Update timeline

  1. investigating Dec 10, 2024, 03:06 PM UTC

    Since 05:38 PT, Dec 10, 2024, we have observed failures for bulk operations and invitation workflows. Below are the workflows that are impacted. 1. Bulk Publish module 2. Bulk archive of module 3. Bulk mirror module 4. Module relevance for a module 5. Module & series Invitation

  2. resolved Dec 10, 2024, 03:44 PM UTC

    The incident has been resolved and the system is now back to normal.

  3. postmortem Dec 19, 2024, 09:36 AM UTC

    **Incident Summary** On December 10, 2024, an issue was observed where users experienced interruptions when attempting to add, deactivate, or invite users to series and modules. The root cause was traced to a periodic database cleanup activity that took longer than expected, leading to a lag in processing and subsequent errors in the workflow. The issue was promptly identified, and mitigation steps, including stopping the cleanup activity, were executed to restore normal operations. **Impact Area** The following functionalities were impacted during the incident: * Bulk Publish Module * Bulk Archive of Module * Bulk Mirror Module * Update Availability for Module * Module Move * Certification Award * Invitation **Incident Timeline** * **December 10, 2024, 6:06 AM PT**: Users began experiencing issues with workflow functionalities. * **December 10, 2024, 6:20 AM PT**: The first report of the issue was logged. * **December 10, 2024, 6:30 AM PT**: The team initiated an investigation into the root cause. * **December 10, 2024, 6:50 AM PT**: The issue was identified as related to Database cleanup activity. * **December 10, 2024, 7:38 AM PT**: Database cleanup halted, and normal functionality was restored. * **December 10, 2024, 7:44 AM PT**: Issue resolved. **Root Cause Analysis** The issue stemmed from a periodic database cleanup activity that exceeded its expected duration, causing processing delays and errors in key workflows. **Next Steps and Preventive Actions** * **Enhanced Monitoring**: Improved tracking of database cleanup activities to detect and mitigate delays proactively. * **Optimized Cleanup Processes**: Review and optimize database cleanup activities to minimize processing time and ensure stability. * **Improved Workflow Resilience**: Introduce mechanisms to handle delays in dependent processes gracefully without causing errors. We apologize for the inconvenience caused by this incident.