Logit.io experienced a critical incident on March 10, 2021 affecting Dashboard and Visualisation Hosts and 1 more component, lasting 17h 9m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Mar 10, 2021, 12:22 AM UTC
We are currently investigating this issue.
- identified Mar 10, 2021, 12:26 AM UTC
We are working with our hosting provider to restore access to services. We will update in 30 minutes.
- identified Mar 10, 2021, 12:37 AM UTC
We are continuing to work on a fix for this issue. We will update in 30 minutes
- identified Mar 10, 2021, 01:03 AM UTC
We are working with our hosting provider to restore access to services. We will update in 30 minutes.
- identified Mar 10, 2021, 01:28 AM UTC
We are working with our hosting provider to restore access to services. We will update in 60 minutes
- identified Mar 10, 2021, 02:49 AM UTC
We are working with our hosting provider to restore access to services. We will update in 60 minutes
- identified Mar 10, 2021, 03:31 AM UTC
There has been a major fire at one of our data centers affecting some core services. Note this does not impact Logstash and Elasticsearch logs ingestion which remain unaffected. We have invoked our DR/BCP plan to migrate and restore the affected services to a different data center. We will update in 2 hours.
- identified Mar 10, 2021, 06:13 AM UTC
Our engineers have restored the platform dashboard https://dashboard.logit.io and other core services. We will provide another update in 2 hours
- identified Mar 10, 2021, 08:13 AM UTC
Our engineers are continuing to restore all services and are progressing well implementing our DR plan. Note: this does not impact Logstash and Elasticsearch logs ingestion which remain unaffected. We will update in 1 hour.
- identified Mar 10, 2021, 09:25 AM UTC
Our engineers are in the process of recreating all Kibana instances and are progressing well implementing our DR plan. Note: this does not impact Logstash and Elasticsearch logs ingestion which remain unaffected. We will update in 1 hour.
- identified Mar 10, 2021, 10:35 AM UTC
The majority of affected Kibana instances are now back online. Our engineers are continuing to work to restore other core services. We will update in 1 hour.
- identified Mar 10, 2021, 11:31 AM UTC
All Kibana instances are now back online. If you are still having issue with Kibana please reach out to the team. Our engineers are continuing to work to restore other core services. We will update in 1 hour.
- identified Mar 10, 2021, 01:00 PM UTC
All Kibana instances are now back online. If you are still having issue with Kibana please reach out to the support team. Our engineers are continuing to work to restore other core services including the shared api. We will update in 1 hour.
- identified Mar 10, 2021, 02:30 PM UTC
All Kibana instances are now back online. If you are still having issue with Kibana please reach out to the support team. Our engineers are bringing the api and other core services back online now. We will update in 1 hour.
- identified Mar 10, 2021, 03:50 PM UTC
The ingestion API is now back online. If you are still having issue with the API please reach out to the support team. Our engineers are working to bring the alerting infrastructure back online. We will update in 1 hour.
- monitoring Mar 10, 2021, 04:56 PM UTC
We have recovered all of the major core services from backups, that had been lost in the EU data center fire. We will continue to monitor the platform for the coming hours to ensure stability, but we believe we have fully recovered all services, if you have any questions or need support please reach out to us
- resolved Mar 10, 2021, 05:46 PM UTC
This incident has been resolved.
- postmortem Mar 24, 2021, 02:34 PM UTC
[Click here to view postmortem](https://logit.io/blog/post/logit-io-platform-outage-incident-report)