Dead Man's Snitch incident
False alerts and dashboard 503 errors
Dead Man's Snitch experienced a major incident on September 27, 2021 affecting Snitch Check-in Processing and Management Portal and 1 more component, lasting 4h 50m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Sep 27, 2021, 08:04 AM UTC
We're currently investigating issue affecting check-in processing and dashboard availability. We believe these are related to two major issues affecting our hosting provider (Heroku) and are currently investigating. https://status.heroku.com/incidents/2361 https://status.heroku.com/incidents/2362
- identified Sep 27, 2021, 08:48 AM UTC
We've temporarily disabled alerting as we investigate a way to work around the upstream issues.
- identified Sep 27, 2021, 09:44 AM UTC
We've worked around the issues with check-in processing and are currently working through the backlog of pending check-ins in the queue. It doesn't appear our check-in receiver was impacted by the outage, just the workers that process the check-ins.
- monitoring Sep 27, 2021, 09:57 AM UTC
Our check-in workers have caught up on all pending check-ins and alerts should be accurate going forward. Our main goal has been to get alerting and check-in processing back online. Heroku continues to experience issues with dynos and routing requests. We've worked around the dyno issues by temporarily moving check-in processing to hosts on EC2. We monitoring check-in process and Heroku's status and will update once we consider the issue fully resolved.
- monitoring Sep 27, 2021, 12:10 PM UTC
Routing issues with the API and Dashboard appear to be mostly resolved. We are migrating some check-in processing back to Heroku and will continue to monitor the situation.
- resolved Sep 27, 2021, 12:55 PM UTC
All systems are green. We've migrated all processing back to Heroku as they have resolved their upstream issue. Our processing system should have recovered more quickly than it did and we're investigating a possible fix for that.