Fleetio incident

Database connectivity issue

Fleetio experienced a critical incident on March 10, 2023 affecting Fleetio Web Application & API, lasting 1h 6m. The incident has been resolved; the full update timeline is below.

Started: Mar 10, 2023, 07:08 PM UTC
Resolved: Mar 10, 2023, 08:15 PM UTC
Duration: 1h 6m
Detected by Pingoru: Mar 10, 2023, 07:08 PM UTC

Affected components

Fleetio Web Application & API

Update timeline

investigating Mar 10, 2023, 07:08 PM UTC

Fleetio is currently experiencing an outage that may be preventing users from accessing the system, We are investigating and will follow up with another update soon
monitoring Mar 10, 2023, 07:46 PM UTC

We have identified the cause of the problem and have implemented a fix. We are continuing to monitor the recovery.
resolved Mar 10, 2023, 08:15 PM UTC

This incident has been resolved. We aim to provide a full breakdown explaining this downtime and what we're doing to prevent similar events in the future in the coming days.
postmortem Mar 17, 2023, 03:34 PM UTC

**Overview:** On March 10th, 2023 at 12:55 PM CST, Fleetio experienced a complete outage of our systems. This outage lasted for approximately 48 minutes and impacted all users and services, including the browser app, iOS and Android apps, and the API. The root cause of the issue was a failed deployment initiated by our team, which resulted in an interruption to our database services. **Root Cause:** The root cause of the outage was identified as a failed deployment in our Amazon Web Services \(AWS\) instance that was intended to improve internal system configuration. An unexpected error occurred during the deployment process, causing all database connections to fail. As a result, our production systems were unable to connect to these databases and our users were met with error messages when attempting to access Fleetio. **Impact:** The outage resulted in a significant impact on our users and services. During the event, users were completely unable to access Fleetio applications. **Resolution:** To resolve the issue, our team worked to identify the errors which led to the deployment failure. We were able to find the root cause and rolled back these changes. We then performed a comprehensive review of our deployment processes to identify areas for improvement, and to prevent similar incidents from occurring in the future. **Conclusion:** As always, we understand that Fleetio is mission critical for our customers, and that any disruption has a real world impact on their operations.We apologize for any inconvenience caused by this outage and appreciate your patience and understanding as we worked to resolve the issue. We remain committed to providing a reliable and resilient system for our users and will continue to prioritize the ongoing improvement of our processes and systems.