Walkme experienced a critical incident on July 4, 2022 affecting Designer (Editor), lasting 8h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jul 05, 2022, 12:48 AM UTC
WalkMe US Editor is currently down for all users. At this time, we are unable to log into the Editor.
- resolved Jul 05, 2022, 09:37 AM UTC
Connection issue which was fixed by our R&D.
- postmortem Jul 05, 2022, 04:45 PM UTC
## **Postmortem & Root Cause Analysis** On July 4 at 21:00 UTC, a rare technical failure in a microservice impacted the WalkMe Editor’s loading mechanism. The failure of the microservice did not activate automatic alerting that would usually have identified and escalated the issue more rapidly to WalkMe development teams for resolution. The database continually tracks issues, logs them, and escalates them as required. There are settings that determine how many of these issues the database can log. Error logs are typically removed manually after a period of time. Thresholds were set at a level that is no longer appropriate for logging issues for this microservice. When too many requests are received by the microservice, the database automatically blocks hosts/IPs. This is what resulted in the WalkMe Editor not being usable. To resolve this issue permanently, we cleared the error records for this microservice and increased the threshold for errors that can be logged in the future. We introduced a new alert to notify development teams that the error log needs to be manually cleared before thresholds are met. Finally, we also reviewed all thresholds across all microservices to ensure that alerting is appropriately activated to the development teams for any similar issue in the future.