Logit.io incident

Dashboard.logit.io unavailable for some customers

Logit.io experienced a major incident on October 13, 2021 affecting Dashboard, lasting 1h 33m. The incident has been resolved; the full update timeline is below.

Started: Oct 13, 2021, 08:10 AM UTC
Resolved: Oct 13, 2021, 09:44 AM UTC
Duration: 1h 33m
Detected by Pingoru: Oct 13, 2021, 08:10 AM UTC

Affected components

Dashboard

Update timeline

investigating Oct 13, 2021, 07:48 AM UTC

We have been notified that dashboard.logit.io is unavailable for some customers and our engineers are actively investigating. We will update in 30 minutes.
identified Oct 13, 2021, 08:18 AM UTC

We have a network outage with our provider, we are working with them to fix their networking issue. We will update in 30 minutes
identified Oct 13, 2021, 08:19 AM UTC

The issue is with the underlying networking with our hosting provider, our engineers are working with them to resolve. We will provide an update in 30 minutes.
monitoring Oct 13, 2021, 08:36 AM UTC

Our engineers have reported that all services are available. Our engineers are now performing healthchecks across all Stacks.
identified Oct 13, 2021, 08:44 AM UTC

The issue persists with the underlying networking with our hosting provider, our engineers are working with them to resolve. We will provide an update in 30 minutes.
identified Oct 13, 2021, 08:58 AM UTC

Our engineers have reported that we now have connectivity and are continuing to check that all services are fully operational
monitoring Oct 13, 2021, 09:11 AM UTC

Our engineers have reported that all services are available. Our engineers are now performing healthchecks across all Stacks.
resolved Oct 13, 2021, 09:44 AM UTC

This incident has been resolved.
postmortem Oct 13, 2021, 02:19 PM UTC

In the morning of October 13th at 07:30 am UTC, [Logit.io](http://Logit.io) on-call engineers were automatically alerted to a new issue affecting the stability of the platform. Our engineers responded quickly to identify the root cause of the problem and by 07:50 UTC had confirmed with our incident response team that the issue was relating to an intervention on the underlying infrastructure provider, which led to disturbances on the entire network. These interventions were aimed at reinforcing anti-DDoS protections. Our teams worked closely with the underlying infrastructure providers teams who then isolated the equipment at 08:15 am UTC, restoring the normal service. Our engineers then performed a series of health checks across the infrastructure and individual stacks to confirm the incident was resolved. We sincerely apologise to all of our customers affected by this incident and we commit to be as transparent as possible about the causes and consequences in relation to the incident.