Dixa incident

Conversations and Search: Degraded performance

Minor Resolved View vendor source →

Dixa experienced a minor incident on November 11, 2024 affecting Search and Dashboard, lasting 1h 40m. The incident has been resolved; the full update timeline is below.

Started
Nov 11, 2024, 10:34 AM UTC
Resolved
Nov 11, 2024, 12:14 PM UTC
Duration
1h 40m
Detected by Pingoru
Nov 11, 2024, 10:34 AM UTC

Affected components

SearchDashboard

Update timeline

  1. investigating Nov 11, 2024, 10:34 AM UTC

    We have received reports of instability in the platform. We are investigating the issue. Updates will follow

  2. identified Nov 11, 2024, 10:45 AM UTC

    We have identified an issue affecting the dashboard and search functionalities. These services may be slower than expected or intermittently unresponsive. Our team is actively working on resolving the root cause. Please note that the offers service is unaffected and continues to operate normally. We will provide an update as soon as further information is available.

  3. monitoring Nov 11, 2024, 10:57 AM UTC

    A fix has been implemented to resolve the issue impacting the dashboard and search functionalities. We are currently monitoring the system to ensure stability and confirm that all services are fully operational. We will provide a final update once we verify that the issue is completely resolved.

  4. resolved Nov 11, 2024, 12:14 PM UTC

    The issue affecting the conversation overview and search functionalities has been fully resolved. All services are now operating as expected, and we have confirmed system stability. We are sorry for any inconvenience this has caused and thank you for your patience.

  5. postmortem Nov 19, 2024, 08:43 AM UTC

    ## Summary Dixa’s Search engine was under high pressure due to untimely cluster data rebalancing action. This caused requests routed to one heavily impacted node in particular, to respond slowly or even time out, resulting in failure for certain to load the Conversations overview and Search pages. ## Root Cause Around 1O:35 AM \(CET\), reports about high latency and instability started flowing into Dixa Support. Engineers immediately started investigating the issue and quickly identified one node in our Search engine being overloaded. The issue was caused by the engine automatically starting to move data between nodes to improve the overall data distribution. ## Action Items No immediate action had to be taken as the data rebalancing was completed shortly after the root cause had been identified. In order to prevent untimely actions like this from happening going forward, we will reassess our engine alerts/configurations to help control how/when these, often automated, cluster actions may occur.Moreover, we are also in the process of rebuilding certain Search components, for which extra attention is being paid to our scalability and performance constrain.