Dead Man's Snitch experienced a major incident on January 4, 2021 affecting Snitch Check-in Processing, lasting 3h 40m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jan 04, 2021, 12:14 PM UTC
We're investigating an issue with check-in processing starting around 11:45 UTC.
- monitoring Jan 04, 2021, 12:28 PM UTC
We've restarted the affected service and confirmed that it's processing correctly. It has now caught up on the backlog of pending check-ins. We're continuing to investigate the root cause.
- monitoring Jan 04, 2021, 12:28 PM UTC
We are continuing to monitor for any further issues.
- resolved Jan 04, 2021, 03:54 PM UTC
The root cause has been tracked down to an timeout error during check-in processing that wasn't handled correctly and put the process into a bad state. We're working on a fix for the issue and should have it deployed shortly. Check-in processing stopped at 11:47 UTC but we weren't made aware of the issue until 11:57 UTC. In reviewing our metrics and alerting we've identified a better metric to be alerting on and will be working that into an update to our internal monitoring and alerting systems.