InEvent experienced a major incident on August 26, 2020 affecting API, lasting 6h 27m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Aug 26, 2020, 12:40 PM UTC
We are currently investigating this issue.
- investigating Aug 26, 2020, 12:43 PM UTC
We are continuing to investigate this issue.
- monitoring Aug 26, 2020, 02:59 PM UTC
A fix has been implemented and we are monitoring the results.
- resolved Aug 26, 2020, 06:33 PM UTC
This incident has been resolved.
- postmortem Aug 26, 2020, 07:15 PM UTC
Our servers use an asynchronous event-driven approach, rather than threads, to handle requests. With the modular event-driven architecture, we can have a bucket overflow that outputs the requests as soon as they arrive. Once these requests arrive, we perform same safe-checking and then redirect them to the appropriate logical application. This can be done in multiple ways. What we realized during today operations was that the servers were not able to pipe these many requests as quickly as possible to the application layer. The socket was not able to quickly send the requests to the database, process the output and return to the user. To fix this issue we are improving our load balancer, changing our socket configurations and also changing the database processing mechanism, so the queries can be processed without any additional allocation overload.