Onfleet incident

High overall system latency

Major Resolved View vendor source →

Onfleet experienced a major incident on July 14, 2025 affecting Dashboard and API and 1 more component, lasting 3h 6m. The incident has been resolved; the full update timeline is below.

Started
Jul 14, 2025, 07:00 PM UTC
Resolved
Jul 14, 2025, 10:06 PM UTC
Duration
3h 6m
Detected by Pingoru
Jul 14, 2025, 07:00 PM UTC

Affected components

DashboardAPIMapsiOSAndroidLocations streamingLocations storageAnalyticsSearchETA

Update timeline

  1. investigating Jul 14, 2025, 07:00 PM UTC

    We are experiencing an emergent issue affecting overall system performance. All system functions are available with decreased performance. Our team is actively working to mitigate this issue.

  2. monitoring Jul 14, 2025, 09:07 PM UTC

    We are seeing system latency normalizing after putting some mitigations in place. Our infrastructure and database teams are continuing to monitor this situation.

  3. resolved Jul 14, 2025, 10:06 PM UTC

    System responsiveness is returning to normal. Our engineering teams will continue to work on root causes to avoid latency spikes in the future.

  4. postmortem Jul 17, 2025, 07:10 PM UTC

    We have just concluded a major effort to diagnose and address the periodic performance decreases that have recently affected the Onfleet system. At this time, we believe that bottlenecks have been identified and that the root causes have been addressed, and we are observing improvements on performance-related metrics across the board. This was a major effort that included the following: * Database performance improvements, by means of node redistribution, query optimization, and adjustment of indices * Refactoring of the primary dashboard-oriented data loading endpoints * Improvements to our webhook message processing system * Development of a modern incremental fetching scheme to decrease dashboard lock time * Significant adjustments to our internal server architecture provisioning In aggregate, these changes are not only allowing the system to cope with higher load levels, but are also delivering an improved performance baseline. Our team will continue efforts to improve our instrumentation in order to proactively identify choke points in the future. Additionally, the team will maintain a focus on performance improvements with a particular focus on customers with high task usage.