Porter incident

Offline: Porter Dashboard

Major Resolved View vendor source →

Porter experienced a major incident on October 2, 2024 affecting Porter UI and Porter API and 1 more component, lasting 4m. The incident has been resolved; the full update timeline is below.

Started
Oct 02, 2024, 12:02 PM UTC
Resolved
Oct 02, 2024, 12:07 PM UTC
Duration
4m
Detected by Pingoru
Oct 02, 2024, 12:02 PM UTC

Affected components

Porter UIPorter APIPorter Infrastructure ManagerPorter DNS

Update timeline

  1. investigating Oct 02, 2024, 12:02 PM UTC

    The Porter dashboard and API are currently offline - we're investigating the issue and rolling out a fix.

  2. investigating Oct 02, 2024, 12:03 PM UTC

    A fix is being deployed and the dashboard + API will be back up in a few minutes.

  3. monitoring Oct 02, 2024, 12:04 PM UTC

    The fix has being deployed. The dashboard and API are back up, and we're monitoring all systems.

  4. resolved Oct 02, 2024, 12:07 PM UTC

    This incident has now been fully resolved. A postmortem will follow shortly.

  5. postmortem Oct 02, 2024, 12:08 PM UTC

    This post-mortem deals with a recent outage on the Porter platform on the 2nd of October 2024, which rendered the platform inaccessible for users. This incident did not affect user workloads, and only affected our users' ability to access the Porter dashboard as well as run builds. ### Root Cause In line with our SOC 2 compliance policy, we planned for pushing out some new HTTP security headers for our platform, in order to boost our general security posture. At approximately 11:59am UTC, a few changes were pushed out to the dashboard, adding support for content security policy, permissions policy and access control headers. Once the change was pushed out, a degradation in site accessibility was immediately noted, and an incident was declared at 12:02pm UTC. ### Mitigation While it was clear that the new headers were breaking user access, it wasn’t immediately clear which header was the cause. We attempted to tweak our content security policy and access control headers and realised that this needed more time to understand which combination of header values worked best for the platform. Since user access was affected, we elected to simply revert all the changes. This revert was issued at 12:03pm UTC. ### Monitoring Once the revert was issued, we were able to see the platform come back online, and were able to access the dashboard by 12:04pm UTC. We decided to wait a few minutes and ensure the platform was fully back up, before declaring this incident as resolved.