ServiceChannel incident

US Production App Rollback Incident Report

Notice Resolved View vendor source →

ServiceChannel experienced a notice incident on September 11, 2023, lasting —. The incident has been resolved; the full update timeline is below.

Started
Sep 11, 2023, 06:34 PM UTC
Resolved
Aug 10, 2023, 02:00 AM UTC
Duration
Detected by Pingoru
Sep 11, 2023, 06:34 PM UTC

Update timeline

  1. resolved Sep 11, 2023, 06:34 PM UTC

    The production release of the US application code was rolled back following smoke testing and synthetic monitors that detected errors on the ServiceChannel platform.

  2. postmortem Sep 11, 2023, 06:34 PM UTC

    **US Production App Rollback Incident Report** **Date of Incident:** 08/09/2023 **Time/Date Incident Started:** 08/09/2023, 10:00 pm EDT **Time/Date Stability Restored:** 08/10/2023, 12:00 am EDT **Time/Date Incident Resolved:** 08/10/2023, 12:00 am EDT **Users Impacted:** All **Frequency:** Continuous **Impact:** Major **Incident description:** On 8/9/23, the production release of the US application code was rolled back following smoke testing and synthetic monitors that detected errors on the ServiceChannel platform. **Root Cause Analysis:** Upon investigation, it was determined that the cause of the issue could be traced back to a recent update in the platform session cookie. This update resulted in a malfunction of the Component module due to the module specifying an incorrect Redis store for session data. **Actions Taken:** 1. In response to the incident, the team promptly executed a rollback of the application services code to the previous functional version. After the rollback, the stability of the web platform was confirmed through both smoke testing and synthetic monitors. 2. To address the underlying problem, the Redis connection strings for the component modules were updated. The US Production release was re-deployed on 8/10/23 at 10 PM EDT with the correct configuration applied. **Mitigation Measures:** To prevent similar incidents in the future, the following mitigation measures will be implemented: 1. Ensuring Environment Consistency: A concerted effort will be made to better align production and non-production configurations. 2. Governance of Production Changes: To maintain greater control over potentially disruptive production changes, any changes that, due to scale considerations, can only be applied to the Production environment, will require explicit approval from senior management before implementation. 3. Monitoring Production-Only Variables: We will implement automated monitoring to to regularly check for the presence of "Production Only" configuration values. This practice will provide an additional layer of oversight and help prevent inadvertent changes.