AskNicely experienced a major incident on October 5, 2018 affecting AskNicely Application, lasting 7h 48m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 05, 2018, 03:33 PM UTC
We are currently investigating a 502 error.
- monitoring Oct 05, 2018, 04:13 PM UTC
We've rolled out changes to try resolve issues accessing AskNicely, and are monitoring current status.
- identified Oct 05, 2018, 07:19 PM UTC
We have seen some performance issues that are causing some 502 and 504 errors. We are working hard to see where these are occurring, we will update this as we continue to find the root cause. All alert systems are operating as expected and now we are going through platform monitoring tool
- identified Oct 05, 2018, 07:34 PM UTC
We have identified the source of the problem that has been causing an exceptional high load.
- monitoring Oct 05, 2018, 07:36 PM UTC
We are now monitoring the situation the situation and all our monitoring tools are reporting the system is operating within expected parameters.
- monitoring Oct 05, 2018, 11:19 PM UTC
We are continuing to monitor for any further issues.
- resolved Oct 05, 2018, 11:22 PM UTC
We have now resolved this incident and identified the cause. The engineering team are now doing a postmortem of the event to prevent this happening in the future.
- postmortem Oct 06, 2018, 07:54 AM UTC
We’ve identified endpoints that were not properly rate limited and when receiving a high volume of traffic were causing infrastructure issues. We’re working on better rate limiting coverage rolled out to prevent further outages.