Tenfold incident

Degraded performance issue.

Minor Resolved View vendor source →

Tenfold experienced a minor incident on January 11, 2024 affecting Dashboard and Chrome Extension, lasting 1h 26m. The incident has been resolved; the full update timeline is below.

Started
Jan 11, 2024, 08:41 PM UTC
Resolved
Jan 11, 2024, 10:07 PM UTC
Duration
1h 26m
Detected by Pingoru
Jan 11, 2024, 08:41 PM UTC

Affected components

DashboardChrome Extension

Update timeline

  1. investigating Jan 11, 2024, 08:41 PM UTC

    We are currently investigating reported performance issues.

  2. monitoring Jan 11, 2024, 09:10 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. resolved Jan 11, 2024, 10:07 PM UTC

    This incident has been resolved.

  4. postmortem Jan 18, 2024, 11:10 PM UTC

    **LivePerson Incident #_SEV-106 -_ Root Cause Analysis** **Date:** January 11, 2024 **Severity:** SEV1 **Start time:** 01:17 PM CT **End time:** 02:43 PM CT **Duration:** 1 Hour\(s\), 26 Minute\(s\) **Summary** On January 11, 2024, at 1:17 PM CT, LivePerson’s Tenfold Cloud Operations team observed increasing operation latency on the Tenfold Platform. LivePerson immediately assembled a war room and began an investigation upon notification of the issue. During the investigation, engineering teams observed that the incident was affecting the data streaming subsystem which would cause high latency for all users of the Tenfold Platform including voice agents with the Tenfold Application and admin users of the Tenfold Dashboard. During the investigation, engineering teams attributed the incident to new staging data streaming components that were prematurely brought into service. As part of a major data streaming upgrade planned for the Tenfold platform, new components were being built up in preparation for the upgrade. These new components triggered the delays and latency observed by users. Once identified, engineering teams immediately decommissioned the new components and allowed the Tenfold Platform to return to normal operating conditions. At 2:43 PM CT, the incident was resolved when latency metrics returned to normal. **Customer Impact** During the January 11, 2023 incident, all customers of LivePerson’s Tenfold solution were affected by high delays in operations with some operations timing out. **Post-Incident Analysis** Prior to the incident, the mentioned data streaming system was scheduled for a major upgrade post-holiday freeze period. In preparation for this upgrade, the infrastructure teams have been building new upgraded data streaming components for a blue-green style upgrade. It has been identified that some of the new components were undergoing preliminary testing and were incorrectly configured to connect to the production cluster. This created a rebalance process which introduced high levels of latency. The mitigation was to immediately shut down the new data streaming components and allow the platform to return to the normal operating state. **Corrective Actions** During the January 11, 2023 incident, LivePerson’s engineering teams mitigated the issue by removing from service the offending components. As preventative measures against similar incidents, LivePerson is implementing the following long-term corrective actions: * Team members involved in infrastructure upgrades will be educated on configuration and standard process for upgrades of this type. \(**Completed on January 12, 2023**\)