Mindtickle experienced a minor incident on January 2, 2025, lasting —. The incident has been resolved; the full update timeline is below.
Update timeline
- resolved Jan 15, 2025, 01:05 PM UTC
On January 2, 2025, between 8:50 AM PST to 01:15 PM PST, one of the nodes in our database cluster became unhealthy, causing an increased load on the remaining nodes. This led to sequential unlocking requests getting stuck in a queue and being processed with a delay.
- postmortem Jan 15, 2025, 01:05 PM UTC
**Incident Summary:** On January 2, 2025, at 8:50 AM PST, one of the nodes in our database cluster became unhealthy, causing an increased load on the remaining nodes. This led to sequential unlocking requests getting stuck in a queue and being processed with a delay. **Impact:** Users experienced delays in sequential unlocking operations for approximately 4 hours and 25 minutes. No data loss or corruption was observed, but service performance was degraded during this period. **Timeline:** * \[02-Jan-2025, 08:50 AM PST\]: Issue detected with one of the database cluster nodes becoming unhealthy. * \[02-Jan-2025, 09:20 AM PST\]: Sequential unlocking requests began experiencing delays. * \[02-Jan-2025, 01:15 PM PST\]: The unhealthy node issue was resolved, and the backlog of queries began processing. * \[02-Jan-2025, 01:15 PM PST\]: Backlog fully cleared, and normal operations resumed. **Resolution:** The unhealthy node was identified and restored to a healthy state, allowing the system to process the backlog of delayed queries. Once the backlog was cleared, sequential unlocking operations returned to normal functionality. **Next Steps:** To prevent recurrence, the following actions will be taken: 1. Implement additional monitoring and alerting mechanisms to detect similar issues early. 2. Review and optimize our handling of queued requests to minimize delays during high-load scenarios. We sincerely apologize for the inconvenience caused and appreciate your understanding as we work to improve the resilience of our systems.