Mindtickle incident

Intermittent failures in scheduled sync operations

Minor Resolved View vendor source →

Mindtickle experienced a minor incident on August 26, 2024, lasting —. The incident has been resolved; the full update timeline is below.

Started
Aug 26, 2024, 10:03 AM UTC
Resolved
Aug 06, 2024, 05:00 AM UTC
Duration
Detected by Pingoru
Aug 26, 2024, 10:03 AM UTC

Update timeline

  1. resolved Aug 26, 2024, 10:03 AM UTC

    Between August 05, 2024, 22:00 PT and August 14, 2024, 20:00 PT, an issue in our integration sync process caused scheduled syncs for multiple customers to be skipped. This affected all Salesforce syncs, which failed during this duration. All other data integrations, such as user syncs from BambooHR and Workday and content syncs from LinkedIn Learning, Google Drive, and SharePoint, also failed intermittently.

  2. postmortem Aug 26, 2024, 10:03 AM UTC

    **What Happened?** Between August 05, 2024, 22:00 PT and August 14, 2024, 20:00 PT, an issue in our integration sync process caused scheduled syncs for multiple customers to be skipped. This affected all Salesforce syncs, which failed during this duration. All other data integrations, such as user syncs from BambooHR and Workday and content syncs from LinkedIn Learning, Google Drive, and SharePoint, also failed intermittently. **Root Cause:** A code deployment on August 05, 2024, introduced an error in the lambda function responsible for initiating these scheduled syncs. Specifically, a `NullPointerException` occurred when attempting to start a Salesforce sync, which was not handled correctly, causing the entire sync process to fail and skip subsequent syncs. ‌ **Timeline:** * **Aug 05, 22:00 PT:** The incident begins with the deployment of a code change in the lambda function. * **Aug 12, 22:18 PT:** The issue was detected after customer reports were escalated by support. * **Aug 13, 11:25 PT:** Sync operations triggered manually to unblock customers * **Aug 14, 20:00 PT:** Fix deployed, and sync operations resumed. ‌ **Learning and Next Steps:** * **Code Resilience:** The scheduler has been modified to ensure that a failure in one sync does not affect others. It has now been implemented and deployed. * **Testing Enhancements:** Improve automated test coverage to include scheduled syncs, reducing the likelihood of similar issues in the future. * **Monitoring Improvements:** Implement specific feature-level alerts for sync failures to enable quicker detection and response to such issues. We apologize for the inconvenience caused and assure you that we are committed to preventing similar incidents in the future. Thank you for your continued trust in our services.