Gainsight incident

CS - EU: Investigating Elevated Error Rates

Major Resolved View vendor source →

Gainsight experienced a major incident on January 3, 2024 affecting Gainsight CS EU Application, lasting 39m. The incident has been resolved; the full update timeline is below.

Started
Jan 03, 2024, 02:34 PM UTC
Resolved
Jan 03, 2024, 03:14 PM UTC
Duration
39m
Detected by Pingoru
Jan 03, 2024, 02:34 PM UTC

Affected components

Gainsight CS EU Application

Update timeline

  1. investigating Jan 03, 2024, 02:34 PM UTC

    We are investigating a sudden increase in error rates which may lead to degraded performance or service interruption.

  2. monitoring Jan 03, 2024, 02:50 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. resolved Jan 03, 2024, 03:14 PM UTC

    This incident has been resolved.

  4. postmortem Feb 23, 2024, 05:43 AM UTC

    **Incident:** An isolated number of customers experienced degraded performance in CS-EU Rules on the 3rd of January, 2024. This could have also intermittently impacted the ability to log into the application. **Root Cause:** This incident was result of an elevated number of API requests coming from a single microservice. The unexpected increase led to a build-up of connections, impacting performance on a subset of API servers. Rate limiting functionality was not configured as expected in this case. **Recovery Action:** Once the affected systems and related traffic were identified, Isolating and restarting effected API services resolved the issue immediately. ‌ **Preventive Measures:** We have corrected the rate limiter functionality for the microservice that caused this issue.