Arista CloudVision incident

Platform Disruption

Major Resolved View vendor source →

Arista CloudVision experienced a major incident on January 31, 2026 affecting Core Platform, lasting 1h 26m. The incident has been resolved; the full update timeline is below.

Started
Jan 31, 2026, 12:16 AM UTC
Resolved
Jan 31, 2026, 01:42 AM UTC
Duration
1h 26m
Detected by Pingoru
Jan 31, 2026, 12:16 AM UTC

Affected components

Core Platform

Update timeline

  1. identified Jan 31, 2026, 12:16 AM UTC

    We are investigating a platform degradation in the US region.

  2. identified Jan 31, 2026, 01:01 AM UTC

    The disruption is continuing. This is due to disruptions with our underlying provider, and we are working on restoring the service ASAP. Thank you for your patience.

  3. identified Jan 31, 2026, 01:04 AM UTC

    Data ingestion into the platform remains uninterrupted, but UI interactions and usage are currently being disrupted.

  4. monitoring Jan 31, 2026, 01:33 AM UTC

    A fix has been implemented and we are monitoring the results.

  5. resolved Jan 31, 2026, 01:42 AM UTC

    This incident has been resolved.

  6. postmortem Feb 13, 2026, 09:01 PM UTC

    **Incident Report: Platform Outage - January 31st, 2026** On January 31, 2026, CloudVision as-a-Service experienced a platform-wide incident within the cv-prod-us-central1-a region that lasted a total of 1 hour and 18 minutes. During the first hour of this window, the platform remained operational despite intermittent connectivity disruptions that impacted <1% of total traffic. In the final 18 minutes of the incident, traffic loss increased to approximately 2%. During this period, users likely experienced a gradual loss of connectivity and platform access. This included disruptions to device streaming, where devices may have started to coalesce data to resume streaming once platform stabilized, which is the standard failure-mode behavior designed to preserve data integrity during network instability. This incident was not the result of a security breach, unauthorized intrusion, or any malicious activity. ‌ **Root Cause** The root cause of this disruption was identified as a catastrophic networking loss within our third-party cloud infrastructure provider. ‌ **Our Response** Our team detected the outage within five minutes of the initial failure and remained in constant communication with the provider to manage triage, recovery, and remediation efforts. During the incident, we made the strategic decision not to initiate a full site disaster recovery. Our technical assessment concluded that a full failover would have been significantly more disruptive to the user experience and would have ultimately prolonged the duration of the incident. ‌ We understand that any service interruption is an unacceptable outcome for our users. We are currently working on internal initiatives to make our architecture more resilient to these specific modes of failure while simultaneously partnering with our cloud provider to ensure better stability of the underlying infrastructure. ‌ We appreciate your patience and your continued use of CloudVision as-a-Service.