Mindtickle incident

Delay in Sequential Unlocking Operation

Mindtickle experienced a minor incident on January 2, 2025, lasting —. The incident has been resolved; the full update timeline is below.

Started: Jan 02, 2025, 04:50 PM UTC
Resolved: Jan 02, 2025, 04:50 PM UTC
Duration: —
Detected by Pingoru: Jan 02, 2025, 04:50 PM UTC

Update timeline

resolved Jan 15, 2025, 01:05 PM UTC

On January 2, 2025, between 8:50 AM PST to 01:15 PM PST, one of the nodes in our database cluster became unhealthy, causing an increased load on the remaining nodes. This led to sequential unlocking requests getting stuck in a queue and being processed with a delay.
postmortem Jan 15, 2025, 01:05 PM UTC

**Incident Summary:** On January 2, 2025, at 8:50 AM PST, one of the nodes in our database cluster became unhealthy, causing an increased load on the remaining nodes. This led to sequential unlocking requests getting stuck in a queue and being processed with a delay. **Impact:** Users experienced delays in sequential unlocking operations for approximately 4 hours and 25 minutes. No data loss or corruption was observed, but service performance was degraded during this period. **Timeline:** * \[02-Jan-2025, 08:50 AM PST\]: Issue detected with one of the database cluster nodes becoming unhealthy. * \[02-Jan-2025, 09:20 AM PST\]: Sequential unlocking requests began experiencing delays. * \[02-Jan-2025, 01:15 PM PST\]: The unhealthy node issue was resolved, and the backlog of queries began processing. * \[02-Jan-2025, 01:15 PM PST\]: Backlog fully cleared, and normal operations resumed. **Resolution:** The unhealthy node was identified and restored to a healthy state, allowing the system to process the backlog of delayed queries. Once the backlog was cleared, sequential unlocking operations returned to normal functionality. **Next Steps:** To prevent recurrence, the following actions will be taken: 1. Implement additional monitoring and alerting mechanisms to detect similar issues early. 2. Review and optimize our handling of queued requests to minimize delays during high-load scenarios. We sincerely apologize for the inconvenience caused and appreciate your understanding as we work to improve the resilience of our systems.