Onfleet incident

Batch Task Creation Jobs Failing

Major Resolved View vendor source →

Onfleet experienced a major incident on February 27, 2025 affecting Dashboard and API, lasting 3h 32m. The incident has been resolved; the full update timeline is below.

Started
Feb 27, 2025, 09:41 PM UTC
Resolved
Feb 28, 2025, 01:13 AM UTC
Duration
3h 32m
Detected by Pingoru
Feb 27, 2025, 09:41 PM UTC

Affected components

DashboardAPI

Update timeline

  1. investigating Feb 27, 2025, 09:41 PM UTC

    At this time the internal process for managing task creation queues for the asynchronous batch task creation endpoint is experiencing issues. This is being investigated with high urgency.

  2. monitoring Feb 28, 2025, 12:38 AM UTC

    A fix has been implemented and we are monitoring the results.

  3. resolved Feb 28, 2025, 01:13 AM UTC

    This incident has been resolved.

  4. postmortem Mar 17, 2025, 03:18 PM UTC

    Between 11:45 a.m. and 12:20 p.m. PST, a batch request began to loop unexpectedly, interfering with each other and causing container contention. Because of how batch creations are processed, this prevented other jobs from being processed until they were cleared manually. Once the job was manually stopped, the system returned to normal batch processing. The team will implement enhanced monitoring to avoid this scenario in the future.