Mindtickle incident

Users Unable to Access Series & Modules on the Mindtickle Platform

Notice Resolved View vendor source →

Mindtickle experienced a notice incident on June 5, 2025, lasting —. The incident has been resolved; the full update timeline is below.

Started
Jun 05, 2025, 09:12 AM UTC
Resolved
Jun 05, 2025, 08:36 AM UTC
Duration
Detected by Pingoru
Jun 05, 2025, 09:12 AM UTC

Update timeline

  1. resolved Jun 05, 2025, 09:12 AM UTC

    On June 5th, 2025, between 1:36 AM PT and 2:06 AM PT, users experienced an issue where the Series and Modules were inaccessible on the Mindtickle platform. The issue was resolved at 2:06 AM PT, and full access to both Series and Modules has been successfully restored. We are actively monitoring the system to ensure continued stability. A detailed Root Cause Analysis (RCA) will be shared shortly.

  2. postmortem Jun 13, 2025, 09:30 AM UTC

    **Incident Summary** On June 5, 2025, a temporary issue was observed where users were unable to view the Series Listing Page across the platform. The root cause was traced to a misconfiguration in the infrastructure setup of a new caching service \(Amazon DAX\) introduced to improve system performance. The configuration error blocked communication between application services and the DAX cluster, leading to service disruptions. The issue was promptly identified and resolved by rolling back the change. **Impact Area** The following functionality was impacted during the incident: * Series Listing Page \(site-wide impact\) **Incident Timeline** * June 5, 2025, 1:37 AM PT: Users began experiencing issues with the Series Listing Page. * June 5, 2025, 1:56 AM PT: The engineering team detected the issue and initiated an investigation.June 5, 2025, 2:08 AM PT: Rollback completed and services were restored to a stable state. **Root Cause Analysis** The issue was caused by a misconfiguration in the security settings of the newly introduced Amazon DAX cluster in the production environment. The security group associated with the cluster did not allow required inbound traffic from the application services. This caused service pods to fail and led to errors on the Series Listing Page. **Next Steps and Preventive Actions** * **Improved Health Checks**: Applications will include readiness and liveness probes for external dependencies like DAX to detect failures earlier. * **Enhanced Monitoring and Alerts**: DAX metrics will be integrated into the monitoring framework to ensure real-time visibility and proactive alerts. We apologize for the inconvenience caused by this incident.