Pandium incident

Runs degraded

Major Resolved View vendor source →

Pandium experienced a major incident on September 21, 2024 affecting Admin Dashboard and Runs, lasting 12h 32m. The incident has been resolved; the full update timeline is below.

Started
Sep 21, 2024, 01:02 PM UTC
Resolved
Sep 22, 2024, 01:35 AM UTC
Duration
12h 32m
Detected by Pingoru
Sep 21, 2024, 01:02 PM UTC

Affected components

Admin DashboardRuns

Update timeline

  1. investigating Sep 21, 2024, 01:02 PM UTC

    Some runs are firing slowly due to an intermittent issue with token management. We are investigating.

  2. investigating Sep 21, 2024, 01:02 PM UTC

    We are continuing to investigate this issue.

  3. investigating Sep 21, 2024, 01:35 PM UTC

    We have identified the issue and are investigating a fix.

  4. identified Sep 21, 2024, 02:05 PM UTC

    The issue has been identified and a fix has been released

  5. monitoring Sep 21, 2024, 02:09 PM UTC

    The released fix was effective and we are monitoring platform recovery.

  6. monitoring Sep 21, 2024, 02:37 PM UTC

    We are continuing to monitor for any further issues.

  7. investigating Sep 21, 2024, 03:45 PM UTC

    After recovery, we are experiencing a different issue with our underlying platform. We are investigating

  8. monitoring Sep 21, 2024, 05:06 PM UTC

    A fix has been implemented and we are monitoring recovery.

  9. identified Sep 21, 2024, 06:36 PM UTC

    We have identified that there is an issue with our underlying cloud hosting provider and we are working with them to implement a fix. They have escalated this issue and we will provide an update as soon as possible.

  10. identified Sep 21, 2024, 08:03 PM UTC

    We are continuing to work on a fix for this issue.

  11. identified Sep 21, 2024, 11:35 PM UTC

    The Pandium Integration Hub is fully operational however jobs are still running with delays. We will provide an update as soon as possible.

  12. monitoring Sep 22, 2024, 01:09 AM UTC

    A fix has been implemented and runs are recovering. We are closely monitoring and will update when fully resolved.

  13. resolved Sep 22, 2024, 01:35 AM UTC

    The underlying service issue causing instability has been resolved and runs are firing. We are continuing to monitor for the next few hours and will open a new incident if necessary.