Eleos Technologies incident
Reduction in Errors Logged to Error console
Eleos Technologies experienced a major incident on May 2, 2024 affecting API and App Manager and 1 more component, lasting 2h 5m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating May 02, 2024, 08:31 PM UTC
We are reducing the number of errors being logged to error console to 50% of their normal volumes during this time period to mitigate potential performance issues.
- investigating May 02, 2024, 08:36 PM UTC
We are seeing the same issues as earlier for Drive Axle, Document Hub, App Manager, and Platform document processing.
- investigating May 02, 2024, 08:38 PM UTC
We are currently not logging errors to error console at this time.
- investigating May 02, 2024, 08:48 PM UTC
We are still investigating the cause of this issue. Drive Axle, the Document Hub, and App Manager login are not working.
- monitoring May 02, 2024, 09:10 PM UTC
We are continuing to investigate this issue. At this time, it appears that error rates are approaching normal, however we are continuing to monitor the situation. We believe that this issue is related to difficulties involving a third party partner, and we're working to mitigate the effect these issues are having on Drive Axle, the Document Hub, and App Platform.
- monitoring May 02, 2024, 09:46 PM UTC
Our third party partner has indicated they have applied mitigations on their end, and our error rates are are normal. We are continuing to monitor the situation. Error console is now logging at it's normal rate.
- monitoring May 02, 2024, 10:29 PM UTC
We are still currently monitoring at this time and error rates are normal. During 18:15 UTC to 20:45 UTC, a small percentage of workflow actions that used telematics data and a percentage of loads requests for users with the `manage_shipments` flag set on their authentication requests failed. Intermittently failed requests during this time period would have been retried.
- resolved May 02, 2024, 10:36 PM UTC
We are marking this incident as resolved for now, but we're continuing to monitor system performance.
- postmortem May 03, 2024, 09:06 PM UTC
We have identified the root cause of the Eleos Platform outage that occurred on May 2, 2024 and have prepared an emergency mitigation that can be applied in case of another occurrence. This outage resulted from unexpected system behavior during a partner’s separate outage. The partner has indicated a full resolution as of May 2, 2024 at 22:00 UTC, so at this time, we do not expect an imminent recurrence. However, we’re working to implement a complete fix to the root cause of the unexpected behavior. We’ll share a more detailed post-mortem in the coming week.