Happeo incident

We are experiencing an incident

Major Resolved View vendor source →

Happeo experienced a major incident on April 24, 2020 affecting Web Application and Channels, lasting 40m. The incident has been resolved; the full update timeline is below.

Started
Apr 24, 2020, 01:19 PM UTC
Resolved
Apr 24, 2020, 01:59 PM UTC
Duration
40m
Detected by Pingoru
Apr 24, 2020, 01:19 PM UTC

Affected components

Web ApplicationChannels

Update timeline

  1. identified Apr 24, 2020, 01:19 PM UTC

    We have identified where the issue is and are fixing the root cause.

  2. monitoring Apr 24, 2020, 01:27 PM UTC

    We've fixed the issue and are monitoring to see that everything is running smoothly.

  3. resolved Apr 24, 2020, 01:59 PM UTC

    This incident has been resolved. The incident caused Channels not to respond for 12 minutes, which cause Channels and some related parts not to work. Other parts of Happeo were working as expected. We apologise for this incident.

  4. postmortem Apr 24, 2020, 07:38 PM UTC

    **Start time**: 2020-04-24 16:13:33.152 EEST **Got notified**: 2020-04-24 16:14 EEST **Resolution time:** 2020-04-24 16:26:56.314 EEST **Problem:** Channel service, which serves channel related information, was unable to respond to user requests. This caused some parts of Happeo to not be available. Overall the platform was accessible, but some related functions, such as channel widgets in Pages were unable to populate. **Affected:** Users trying to access Channels. **Root cause:** Our channel service lost database connection, which required a re-loading of certain resources. This caused a need for multiple reboot cycles that created outage and extended the outage to last for 13 minutes and 23 seconds and 162 milliseconds. We’ve already mitigated the issue, which will prevent this from happening in the future.