Wisembly experienced a notice incident on January 21, 2016, lasting 13d 21h. The incident has been resolved; the full update timeline is below.
Update timeline
- investigating Jan 21, 2016, 01:47 PM UTC
We had a service disruption from 2:25pm to 2:37pm (12 minutes). We suspect an internal logging system that caused a system overflow on the network stack, making our application unreachable during this time. We'll post here more info as we investigate further. Sorry for the inconvenience.
- identified Jan 22, 2016, 02:57 PM UTC
Our log manager system rsyslog appeared to have filled up all its buffer while experiencing difficulties to send logs to our various log visualisation tools. It freezed the network stack of our frontals servers, making our solution unreachable until the buffer pool empties. We are monitoring now closely rsyslog and considering updating its version to a newer and more robust one or using other ways to collect and send our logs, in a non-blocking way.
- monitoring Feb 03, 2016, 10:36 AM UTC
We deployed a new config for our logging system and we'll see if it improves performances and limit messages queue size. If so, we will roll out this new config on production environments in a few days.
- resolved Feb 04, 2016, 11:16 AM UTC
This incident has been resolved.