Cognite incident

Kubernetes pods failing in AZ-EASTUS-1

Minor Resolved View vendor source →
Started
Apr 25, 2026, 09:25 AM UTC
Resolved
Apr 25, 2026, 09:25 AM UTC
Duration
Detected by Pingoru
Apr 25, 2026, 09:25 AM UTC

Affected components

Cognite Data Fusion API

Update timeline

  1. investigating Apr 24, 2026, 02:17 PM UTC

    Status: Investigating Cognite is investigating k8s pod restarts in az-eastus-1. This is possibly affecting time series, searching, and contextualization capabilities. Updates will be provided when more information is gathered. Affected components Cognite Data Fusion API (Degraded performance)

  2. investigating Apr 24, 2026, 03:15 PM UTC

    Status: Investigating We are seeing widespread DNS resolution failures in the az-eastus-1 cluster, affecting multiple pods and disrupting the timeseries service. Affected components Cognite Data Fusion API (Degraded performance)

  3. identified Apr 24, 2026, 03:40 PM UTC

    Status: Identified We have confirmed that all customers on az-eastus-1 are affected due to ongoing Azure platform issues in the East US region, now officially acknowledged by Microsoft. These issues impact provisioning, scaling, and connectivity for workloads, resulting in widespread DNS resolution failures and a large number of pods across multiple namespaces in CrashLoopBackOff. Customer impact includes service disruptions for timeseries and data modeling services, intermittent errors accessing InField, and degraded availability for multiple workloads and endpoints. Example error patterns include Postgres connection timeouts and DNS lookup failures. Affected components Cognite Data Fusion API (Degraded performance)

  4. monitoring Apr 24, 2026, 04:20 PM UTC

    Status: Monitoring Cognite has observed all services have recovered or are in a good state of recovery. Customers may possibly still see some slowness in apps recovering from the dns outage. Affected components Cognite Data Fusion API (Degraded performance)

  5. monitoring Apr 24, 2026, 11:26 PM UTC

    Status: Monitoring While our previous update indicated a good state of recovery, we have started to observe service interruptions again across the environment. Customers may continue to experience slowness or intermittent failures in applications as we navigate the final stages of stabilization following the DNS outage. Our engineering team is closely monitoring these new developments and working to ensure a consistent recovery. We will provide further updates as the situation stabilizes. Affected components Cognite Data Fusion API (Degraded performance)

  6. monitoring Apr 25, 2026, 12:43 AM UTC

    Status: Monitoring CDF has recovered again after recurrence of the DNS issue. Microsoft has confirmed they have now completed the rollback of the suspected root cause and we will continue monitoring the issue for a few more hours to verify CDF has fully recovered. Affected components Cognite Data Fusion API (Degraded performance)

  7. resolved Apr 25, 2026, 09:25 AM UTC

    Status: Resolved Microsoft has confirmed the underlying issue has been mitigated and CDF has been running without issues for several hours. Affected components Cognite Data Fusion API (Operational)

Looking to track Cognite downtime and outages?

Pingoru polls Cognite's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when Cognite reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track Cognite alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring Cognite for free

5 free monitors · No credit card required