Digital Pigeon incident

Issue preventing access to application

Critical Resolved View vendor source →

Digital Pigeon experienced a critical incident on November 1, 2021 affecting Application and API Servers, lasting 2h 53m. The incident has been resolved; the full update timeline is below.

Started
Nov 01, 2021, 05:54 AM UTC
Resolved
Nov 01, 2021, 08:48 AM UTC
Duration
2h 53m
Detected by Pingoru
Nov 01, 2021, 05:54 AM UTC

Affected components

Application and API Servers

Update timeline

  1. investigating Nov 01, 2021, 05:54 AM UTC

    We are currently investigating this issue.

  2. monitoring Nov 01, 2021, 07:44 AM UTC

    We have identified and rectified a server issue that caused slow response times and in some cases loss of access to the application. We are currently closely monitoring the servers but access is restored.

  3. resolved Nov 01, 2021, 08:48 AM UTC

    Incident is resolved, we will provide a triage report soon.

  4. postmortem Nov 02, 2021, 10:25 PM UTC

    An incident at AWS resulted in the majority of the application servers to be taken out of action. While the remaining application servers took up the load and were able to response to most requests, with elevated response times, a configuration issue meant it took an 45 minutes before the replacement application servers were available. We will be deploying infrastructure fixes this weekend which will avoid this scenario from repeating again in the future.