Cloud.gov incident

Increased 504 responses to customer applications

Notice Resolved View vendor source →

Cloud.gov experienced a notice incident on October 2, 2023 affecting Applications and API, lasting 2d 17h. The incident has been resolved; the full update timeline is below.

Started
Oct 02, 2023, 07:22 PM UTC
Resolved
Oct 05, 2023, 12:48 PM UTC
Duration
2d 17h
Detected by Pingoru
Oct 02, 2023, 07:22 PM UTC

Affected components

ApplicationsAPI

Update timeline

  1. identified Oct 02, 2023, 07:22 PM UTC

    Twice today the cloud.gov platform experienced high amounts of internal traffic on the platform which caused some customer applications to experience HTTP 504 messages while accessing their applications. These where brief periods of time, around 4 minutes each time, and the platform recovered automatically. At no time did customers applications stop or go down on the platform. At this time the cloud.gov support team has identified the issue, is monitoring for future events, and working on implementing a solution into production to mitigate future events. At this time the platform is fully available.

  2. identified Oct 03, 2023, 11:49 AM UTC

    7:45 EDT update - the cloud.gov support team is aware of another traffic spike yesterday evening and is working on a solution to the issue. Currently that solution is in deployment/testing in lower environments and the team expects to deploy the fix into production later on today.

  3. monitoring Oct 03, 2023, 04:33 PM UTC

    The cloud.gov support team has deployed a fix to production and will be monitoring the system for the rest of the day.

  4. monitoring Oct 03, 2023, 08:17 PM UTC

    16:15 EDT update - the cloud.gov team is still seeing some spikes in traffic after the production roll-out. The team is working on an additional fix that will be deployed once it passes testing in lower environments. We will update this incident once this additional fix is deployed to production.

  5. monitoring Oct 04, 2023, 11:59 AM UTC

    8 AM EDT update - the cloud.gov support has deployed the additional fix to production and is now monitoring the system.

  6. resolved Oct 05, 2023, 12:48 PM UTC

    Since implementing the second production fix yesterday, the platform has been stable and working now as expected. We are closing this incident.