StatusPage incident

Intermittent errors while accessing public Statuspages

Major Resolved View vendor source →

StatusPage experienced a major incident on October 28, 2023 affecting HTTP Pages and HTTPS Pages and 1 more component, lasting 24m. The incident has been resolved; the full update timeline is below.

Started
Oct 28, 2023, 07:36 AM UTC
Resolved
Oct 28, 2023, 08:01 AM UTC
Duration
24m
Detected by Pingoru
Oct 28, 2023, 07:36 AM UTC

Affected components

HTTP PagesHTTPS PagesPublic API

Update timeline

  1. investigating Oct 28, 2023, 07:36 AM UTC

    We are currently seeing intermittent errors in viewing public Statuspages. We are investigating this problem and will provide updates shortly

  2. monitoring Oct 28, 2023, 07:55 AM UTC

    Update: We have fixed the issue and are monitoring actively

  3. resolved Oct 28, 2023, 08:01 AM UTC

    Issue is now resolved and everything is back to normal working state.

  4. postmortem Nov 06, 2023, 09:25 AM UTC

    ### **SUMMARY** From 06:00 UTC to 07:45 UTC on October 28, 2023, Atlassian customers using Statuspage had intermittent issues with all Statuspage functionality. The event occurred due to a database performance issue during a [scheduled database maintenance](https://metastatuspage.com/incidents/s21b66328h9j). This impacted customers in all regions. The incident was detected within one minute by monitoring the upgrade process and mitigated by rolling back to a known good snapshot which put Statuspage systems into a known good state. The total time to resolution was about one hour and 45 minutes. ### **IMPACT** The overall impact was between 06:00 UTC and 07:45 UTC October 28, 2023. This incident affected Statuspage customers from all regions and caused intermittent backend errors on all Statuspage activity including viewing pages, adding subscribers, and creating/updating events. We performed a rollback operation during recovery to return to a known good state. ### **ROOT CAUSE** The issue was caused by database performance issues after a routine database maintenance and upgrade. As a result, our backends returned intermittent errors to several user requests. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We take the utmost care to provide a highly reliable service. We will pursue several preventive measures to ensure that this situation does not occur in the future, including: * Fixing the cause of the performance issues before future upgrades; and * Improving our testing process for database upgrades to catch potential performance issues. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. Thanks, Atlassian Customer Support