ServiceChannel incident
Code Release Causes US Environment Outage
ServiceChannel experienced a major incident on May 15, 2025, lasting —. The incident has been resolved; the full update timeline is below.
Update timeline
- resolved May 15, 2025, 09:59 PM UTC
During the scheduled US production code release on May 8, 2025, ServiceChannel encountered technical issues that impacted service availability on our platform. Users experienced login difficulties from 2:29 AM to 3:07 AM EDT, while critical dashboard functionality was unavailable from 2:29 AM to 4:12 AM EDT.
- postmortem May 15, 2025, 10:00 PM UTC
**Incident Report: Code Release Causes US Environment Outage** **Date of Incident:** 05/08/2025 **Time/Date Incident Started:** 05/08/2025, 2:29 am EDT **Time/Date Stability Restored:** 05/08/2025, 4:07 am EDT **Time/Date Incident Resolved:** 05/08/2025, 4:12 am EDT **Users Impacted:** Many **Frequency:** Continuous **Impact:** Major **Incident description:** During the scheduled US production code release on May 8, 2025, ServiceChannel encountered technical issues that impacted service availability on our platform. Users experienced login difficulties from 2:29 AM to 3:07 AM EDT, while critical dashboard functionality was unavailable from 2:29 AM to 4:12 AM EDT. **Root Cause Analysis:** Login Module Issue**:** As part of ongoing deployment process enhancements, a configuration adjustment was made that worked correctly in our testing environments but behaved differently in production. The issue was identified and resolved through our standard troubleshooting procedures. Dashboard Issue**:** A configuration setting that was properly configured in our development environments had not been fully synchronized to the production environment. This discrepancy wasn't detected until the new code attempted to access the setting during the deployment. Full platform functionality was confirmed restored by 4:12 AM EDT **Actions Taken:** * SRE team immediately investigated upon receiving alerts starting at 2:29am EDT indicating issues with two critical systems: dashboard and login * CICD team successfully rolled back the login module to the prior version, restoring user access by 3:07 AM EDT * Dashboard continued to experience issues, so investigation continued while login was restored * Dashboard functionality was restored by ensuring all required configuration settings were properly applied to production **Mitigation Measures:** * Reviewed existing deployment procedures to include improved configuration validation and improved rollback protocols to prevent similar configuration-related issues in the future * Implemented process improvements for immediate communication with support teams following any service disruptions to ensure proper customer follow-up and transparency