Chili Piper incident

Timeout and Loading Issue

Major Resolved View vendor source →

Chili Piper experienced a major incident on October 19, 2023 affecting Dashboard and Instant Booker and 1 more component, lasting 1h 31m. The incident has been resolved; the full update timeline is below.

Started
Oct 19, 2023, 07:29 PM UTC
Resolved
Oct 19, 2023, 09:01 PM UTC
Duration
1h 31m
Detected by Pingoru
Oct 19, 2023, 07:29 PM UTC

Affected components

DashboardInstant BookerBooking LinksInstant Booker Chrome Extension

Update timeline

  1. investigating Oct 19, 2023, 07:29 PM UTC

    We are currently investigating an issue impacting our database which is preventing Chili Piper assets from loading. Instant Booker as well as Chili Piper's dashboard are both impacted. Some booking links appear to load, but performance is extremely degraded. Pages may load after a long delay. Our Site Reliability infrastructure team is actively investigating this with priority, and we will provide an update once this issue is resolved.

  2. identified Oct 19, 2023, 07:59 PM UTC

    We have begun to restore services and have reduced the load on the impacted databases. Databases instances are being restarted to help mitigate the issue. This is showing improvement on load times, however users may still experience long delays or see an "Error: Server Error" 502 message with degraded performance of all Chili Piper pages, links, Instant Booker, and Concierge. We should start seeing more services gradually become healthy in time, and the loading speed will continue to increase, however due to the increased load times we are still considering this a partial outage.

  3. monitoring Oct 19, 2023, 08:14 PM UTC

    Services have been restored, and several sub-services were restarted in order to further support the recovery of all instances. We are seeing load times gradually return to normal with only a minor performance impact reported. We will continue to monitor the recovery closely to ensure our services remain stable.

  4. resolved Oct 19, 2023, 09:01 PM UTC

    We are marking this issue as resolved since incoming data shows services are recovered, error rates have dropped, and loading times are stabilized. Our team will continue to monitor services closely to ensure they remain stable moving forward, and actions have already been taken to ensure this stability is maintained.