Convercent incident

Convercent platform Service Outage

Critical Resolved View vendor source →

Convercent experienced a critical incident on January 18, 2024 affecting US Trial and EU Trial and 1 more component, lasting 1h 33m. The incident has been resolved; the full update timeline is below.

Started
Jan 18, 2024, 11:22 AM UTC
Resolved
Jan 18, 2024, 12:55 PM UTC
Duration
1h 33m
Detected by Pingoru
Jan 18, 2024, 11:22 AM UTC

Affected components

US TrialEU TrialEU ProductionUS Production

Update timeline

  1. investigating Jan 18, 2024, 11:22 AM UTC

    We’ve identified an issue where users are experiencing login failures with the error "invalid user name password". The impact is affecting all users accessing the service through the EU and U.S. production and Trial environments. We’re investigating the issue and will provide an update as soon as possible. If you are experiencing issues, please contact our support teams quoting IM-387

  2. investigating Jan 18, 2024, 12:21 PM UTC

    Our engineering teams continue to investigate and further updates will be provided as soon as possible. If you are experiencing issues, please contact our support teams quoting IM-387

  3. resolved Jan 18, 2024, 12:55 PM UTC

    The issue has been resolved. If you are still experiencing issues, please contact our support teams quoting IM-387. Root cause analysis investigations have been initiated and an RCA will be provided.

  4. postmortem Feb 12, 2024, 03:52 PM UTC

    # Event Description On Thursday, 18th January 2024, between 10:30 UTC and 12:30 UTC, users within the EU Production and Trial environments encountered difficulties accessing the Convercent platform. The reported issues included instances of Invalid username or password errors and unknown account errors upon login. # Findings and Root Cause Upon investigation, our engineering teams determined that an unhealthy web server was the root cause of the disruption. In response, the affected web server was promptly restarted to restore normal user access. ## How could this incident have been avoided? The introduction of proactive monitoring systems is crucial for early detection of anomalies, allowing us to identify and resolve similar instances before they escalate. ## How could we have detected the issue sooner? To prevent the recurrence of similar incidents, we recommend the implementation of proactive monitoring. This approach can detect and address potential issues before they impact user accessibility. ## Is there a contingency or plan to control future incidents of this kind? For long-term incident prevention, we have incorporated proactive monitoring into our systems. This proactive approach ensures timely alerts and enables us to address potential disruptions swiftly. # Corrective Actions To address the immediate issue, the unhealthy web server was restarted. Additionally, as a proactive measure, we have implemented a robust proactive monitoring system to alert us to potential similar incidents in the future.