ServiceChannel incident
US Production App Rollback Incident Report
ServiceChannel experienced a notice incident on September 11, 2023, lasting —. The incident has been resolved; the full update timeline is below.
Update timeline
- resolved Sep 11, 2023, 06:34 PM UTC
The production release of the US application code was rolled back following smoke testing and synthetic monitors that detected errors on the ServiceChannel platform.
- postmortem Sep 11, 2023, 06:34 PM UTC
**US Production App Rollback Incident Report** **Date of Incident:** 08/09/2023 **Time/Date Incident Started:** 08/09/2023, 10:00 pm EDT **Time/Date Stability Restored:** 08/10/2023, 12:00 am EDT **Time/Date Incident Resolved:** 08/10/2023, 12:00 am EDT **Users Impacted:** All **Frequency:** Continuous **Impact:** Major **Incident description:** On 8/9/23, the production release of the US application code was rolled back following smoke testing and synthetic monitors that detected errors on the ServiceChannel platform. **Root Cause Analysis:** Upon investigation, it was determined that the cause of the issue could be traced back to a recent update in the platform session cookie. This update resulted in a malfunction of the Component module due to the module specifying an incorrect Redis store for session data. **Actions Taken:** 1. In response to the incident, the team promptly executed a rollback of the application services code to the previous functional version. After the rollback, the stability of the web platform was confirmed through both smoke testing and synthetic monitors. 2. To address the underlying problem, the Redis connection strings for the component modules were updated. The US Production release was re-deployed on 8/10/23 at 10 PM EDT with the correct configuration applied. **Mitigation Measures:** To prevent similar incidents in the future, the following mitigation measures will be implemented: 1. Ensuring Environment Consistency: A concerted effort will be made to better align production and non-production configurations. 2. Governance of Production Changes: To maintain greater control over potentially disruptive production changes, any changes that, due to scale considerations, can only be applied to the Production environment, will require explicit approval from senior management before implementation. 3. Monitoring Production-Only Variables: We will implement automated monitoring to to regularly check for the presence of "Production Only" configuration values. This practice will provide an additional layer of oversight and help prevent inadvertent changes.