Eleos Technologies incident

Elevated Error Rates for Drive Axle, Document Hub, and Platform Document Delivery

Major Resolved View vendor source →

Eleos Technologies experienced a major incident on May 2, 2024 affecting App Manager and Mobile Apps and 1 more component, lasting 1h 5m. The incident has been resolved; the full update timeline is below.

Started
May 02, 2024, 07:16 PM UTC
Resolved
May 02, 2024, 08:22 PM UTC
Duration
1h 5m
Detected by Pingoru
May 02, 2024, 07:16 PM UTC

Affected components

App ManagerMobile AppsDocument Hub and Drive AxleDocument Delivery

Update timeline

  1. investigating May 02, 2024, 07:16 PM UTC

    We are currently investigating elevated error rates for Drive Axle and the Document Hub. Scanning and retrieval of sent documents are affected for Eleos Platform customers as well. Scanned documents will not be lost during this time and will be retried.

  2. investigating May 02, 2024, 07:26 PM UTC

    Logging into App Manager is also affected during this time.

  3. investigating May 02, 2024, 07:48 PM UTC

    We are actively investigating these issues. During this time period, logging into App Manager is also affected along with logging into the Document Hub. Drive Axle users are experiencing difficulties logging in.

  4. investigating May 02, 2024, 07:54 PM UTC

    Error rates have reduced dramatically during this time period. We're still currently investigating the cause.

  5. monitoring May 02, 2024, 08:02 PM UTC

    We're currently monitoring the system as the error rates have gone back to normal.

  6. resolved May 02, 2024, 08:22 PM UTC

    Error rates have returned to normal. We apologize for interruption of service.

  7. postmortem May 10, 2024, 06:34 PM UTC

    There were two Eleos Platform outages on May 2 from 18:15 UTC to 19:46 UTC and from 20:25 UTC to 20:45 UTC, for a total of 1 hour and 51 minutes. During these outages, users could not log into Drive Axle, Document Hub, or App Manager. During these outages: * Workflow actions and messages that included telematics data submitted by drivers were delayed until after the outage. Workflow actions that were submitted during these times fell back to an offline state if offline workflows were configured. The apps then synchronized actions and messages after the outage. * Drivers with the `manage_shipments` flag enabled potentially failed to retrieve updated load data. * Drivers would have experienced delays when they attempted to upload scanned documents. If drivers logged out while documents were still queued for upload, those documents were lost. * Drivers would have experienced delays when they attempted to retrieve their previously-scanned document list. * Users who were already logged into App Manager would have experienced difficulties with editing document types and editing forms that have document types. Due to a simultaneous outage of a telematics partner, Platform features that relied on their provided services, such as telematics-enabled messages and workflows, would have fallen back to their offline functionality if configured. Regarding users who could not log into Drive Axle, Document Hub, and App Manager, our system experienced these failures because certain authentication calls inadvertently depended on telematics integration logic. Because of the simultaneous outage, these authentication calls timed out, causing resource exhaustion that cascaded to other, non-authentication requests. These requests should be independent. To make them independent, we are making changes that will decouple these requests. We are deeply sorry for the interruptions, delays, and distraction this incident caused for you and your drivers. Compounding that, we did not communicate the existence of a known incident promptly. We are reviewing and adjusting our on-call procedures and training to correct this.