Pandium incident

Degraded Performance

Major Resolved View vendor source →

Pandium experienced a major incident on July 14, 2024 affecting API and Runs, lasting 14h 36m. The incident has been resolved; the full update timeline is below.

Started
Jul 14, 2024, 07:06 AM UTC
Resolved
Jul 14, 2024, 09:42 PM UTC
Duration
14h 36m
Detected by Pingoru
Jul 14, 2024, 07:06 AM UTC

Affected components

APIRuns

Update timeline

  1. investigating Jul 14, 2024, 07:06 AM UTC

    Underlying Control Plane under high load. Investigating. Run and API are running slow.

  2. identified Jul 14, 2024, 03:53 PM UTC

    Issue is with the control plane of underlying hosting provider. We are working with them on mitigation.

  3. identified Jul 14, 2024, 04:35 PM UTC

    The underlying issue has been identified. Ectd cluster of the control plane is having a fit. Working on mitigation.

  4. monitoring Jul 14, 2024, 05:37 PM UTC

    A fix has been implemented. We are now monitoring recovery.

  5. resolved Jul 14, 2024, 09:42 PM UTC

    Scheduled run runs have caught up, and we have not experienced any more issues with control plane.