Gainsight incident

US1 - Connector Delays

Minor Resolved View vendor source →

Gainsight experienced a minor incident on June 13, 2023 affecting US1 Data Ingestion Queue, lasting 6h 39m. The incident has been resolved; the full update timeline is below.

Started
Jun 13, 2023, 05:29 PM UTC
Resolved
Jun 14, 2023, 12:09 AM UTC
Duration
6h 39m
Detected by Pingoru
Jun 13, 2023, 05:29 PM UTC

Affected components

US1 Data Ingestion Queue

Update timeline

  1. investigating Jun 13, 2023, 05:29 PM UTC

    Beginning around 12:00 PM UTC today, we detected a delay in Connectors traffic and adjusted accordingly. As we still have queue delays, we are investigating further and will update as more information becomes available.

  2. identified Jun 13, 2023, 06:43 PM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Jun 13, 2023, 08:04 PM UTC

    A fix has been implemented and we are monitoring. The Connectors queue was blocked for analysis and troubleshooting during this incident. We have since unblocked, and any duplicate sync jobs were aborted with no data impact. Please expect delays while the queue clears.

  4. resolved Jun 14, 2023, 12:09 AM UTC

    This incident has been resolved. A subset of customers faced connector queue delays during this incident window. We will add RCA details as they become available.

  5. postmortem Aug 11, 2023, 04:19 AM UTC

    **Incident:** Beginning around 12:00 UTC on the 13th of June, Engineers were alerted of elevated queue levels for Connector services in CS-US1. **Root Cause:** A leader node was found to have higher than usual disk activity which prevented optimal job execution for Connector services. **Recovery Action:** Engineers scaled the number of Connector instances to correct the issue temporarily. Additionally, Engineers skipped long-running and duplicate jobs to help recover. **Preventive Measures:** System configuration adjustments have been made to prevent these issues moving forward.