Datto incident

Datto RMM - Zinfandel - Job and Audit Execution Delays

Datto experienced a minor incident on March 6, 2026 affecting Zinfandel (US West), lasting 1h 19m. The incident has been resolved; the full update timeline is below.

Started: Mar 06, 2026, 08:45 PM UTC
Resolved: Mar 06, 2026, 10:04 PM UTC
Duration: 1h 19m
Detected by Pingoru: Mar 06, 2026, 08:45 PM UTC

Affected components

Zinfandel (US West)

Update timeline

investigating Mar 06, 2026, 08:45 PM UTC

We are aware of a problem where Jobs and Audits are experiencing delays on the Zinfandel Platform. The Kaseya R&D Team Investigating the issue. Subscribe to the Kaseya Status Page for up-to-date information at https://status.kaseya.com/
monitoring Mar 06, 2026, 09:04 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Mar 06, 2026, 10:04 PM UTC

This incident has been resolved.
postmortem Mar 18, 2026, 10:18 AM UTC

**Summary** Around **2026-03-06 9:40 AM EST,** partners on the Zinfandel platform started experiencing delays in both Quick and Scheduled Job execution. The issue was initially mitigated by 12:00 PM EST; however, the steps taken to restore service inadvertently caused the problem to reoccur later that afternoon at approximately 2:13 PM EST. The R&D and Operations teams fully resolved the issue by 2:47 PM EST. **Root Cause and Resolution** The initial incident was triggered by an unusually large-scale alert resolution operation, which created a significant backlog of processing tasks within the database. The high volume of queued work caused processing times to exceed the allowable execution window. This resulted in repeated retries, which continually saturated the database and prevented other operations from running normally. To alleviate the load, the task scheduling service was scaled down, and the queuing services were recycled, which reduced database pressure and restored normal operation by approximately 12:00 PM EST. However, at around 2:13 PM EST, these earlier mitigation steps produced an unintended side effect: they constrained the throughput of the service responsible for processing device audits. This limitation caused additional downstream delays in Job execution across the platform. The service was subsequently scaled back up to full capacity, and all services were confirmed healthy by **2:47 PM EST**. **Preventative Measures** To reduce the likelihood and impact of similar incidents in the future, the following steps are being taken: * **Resolution of Related Product Issues:**` `The R&D team has identified a backend software defect that contributed to the incident. A fix is scheduled for the **14.9 release**. * **Enhanced Monitoring, Alerting, and Response:**` `The Kaseya R&D team is reviewing additional monitoring capabilities to provide deeper insight into application performance at the component level for key services. * **Improved Incident Management and Response:**` `Global Kaseya teams will continue to receive training and coaching on Incident Management playbooks to ensure that all internal stakeholders are promptly informed and take coordinated action when events occur.