Dead Man's Snitch incident
Elevated check-in error rate and timeouts
Dead Man's Snitch experienced a minor incident on June 22, 2022 affecting Snitch Check-in Processing, lasting 51m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jun 22, 2022, 01:34 PM UTC
We're seeing an increase in timeouts for our check-in service. We've removed a pair of misbehaving nodes from our load balancer and we're investigating the root cause. Any check-in that received a 408 Request Timeout error was not processed.
- monitoring Jun 22, 2022, 02:07 PM UTC
We've replaced the affected nodes and error rates have gone back to normal levels
- resolved Jun 22, 2022, 02:25 PM UTC
Error rates have returned to normal and systems are all green. We've identified the possible root cause as a bug in the version of the message bus client we were using that could cause a deadlock in some cases. We've updated to the latest version of the client and rolled out the update to production.