AI-Media experienced a minor incident on October 6, 2025, lasting —. The incident has been resolved; the full update timeline is below.
Update timeline
- resolved Oct 06, 2025, 10:30 PM UTC
We investigated a brief service disruption that affected LEXI and Translate jobs on September 29th (21:56-22:00 UTC). The disruption lasted approximately 4 minutes and 19 seconds and was related to an automatic update of an AWS service performed during that time. Users experienced different impacts depending on when their jobs were initiated: New LEXI/Translate jobs started during this window were unable to connect to the AWS service during startup, resulting in startup failures. These jobs self-resolved once the disruption ended. LEXI/Translate jobs already running before the disruption experienced significant caption delays. Captions produced during the disruption were output in a large batch after service restoration, with read-speed functionality automatically increasing output speed to compensate before returning to normal real-time operation.
- postmortem Oct 07, 2025, 11:23 PM UTC
**LEXI Disruption Postmortem:** September 29, 2025, 21:56:04 - 22:00:23 UTC \(4m 19s\) **Summary:** A 4-minute and 19-second service disruption affected LEXI and Translate jobs due to an automatic update of a critical backend service. New jobs failed to start, while running jobs experienced significant caption delays that self-resolved after the disruption ended. **Impact** * **New Jobs:** Unable to connect to backend service during startup, resulting in startup failures. Jobs self-resolved when the disruption ended. * **Running Jobs:** Experienced significant caption delays during the disruption window. Delayed captions were output in batch after service restoration, with read-speed functionality automatically compensating before returning to normal operation. **Root Cause:** An automatic update of a critical backend service during the maintenance window caused the temporary unavailability of a dependency required for LEXI/Translate job initialization and real-time caption processing. **Timeline** * 21:56:04 UTC - Disruption begins with backend service update * 21:56:04 - 22:00:23 UTC - New jobs fail to start; running jobs experience delays * 22:00:23 UTC - Backend service update completes; service functionality restored * Post-incident - Investigation and remediation planning initiated **Resolution** Service was automatically restored once the backend service update was completed. No manual intervention was required. **Action Items Completed** * Modified service update window to minimize user impact * Improved fallback handling mechanisms for backend service dependencies **Action Items In Progress** * Implementing additional redundancy measures to strengthen service resilience during dependency updates