Easee experienced a major incident on July 17, 2024 affecting AMQP, lasting 4h 15m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jul 17, 2024, 07:38 AM UTC
The issues from the last incident seem to be persistent. This is resulting in a reduced flow of messages through the network, which is impacting the performance of our operators' applications and services. We are working to address this issue and will likely issue a service restart. We apologize for any inconvenience this may cause and appreciate your patience. Thank you for your understanding.
- identified Jul 17, 2024, 08:22 AM UTC
A couple of issues were identified - 1. Our message broker is not releasing allocated memory in a predictable way. We will allocate more resources at this time while we find an overal strategy/policy to mitigate. 2. We have a lot of internal connections pushing messages to our message broker. Reducing the connections could provide memory relief on the AMQP cluster, but could potentially increase latency. We will monitor any tweaking we perform here. We are tweaking the service limits and also increasing the compute resources. While we apply the mitigations the performance of the service will be impacted but we expect full recovery within the hour. Apologies again for any incovenience.
- monitoring Jul 17, 2024, 08:44 AM UTC
Service restart completed and messages are flowing through the network.
- resolved Jul 17, 2024, 11:54 AM UTC
All servicing upgrades completed. We are now on the latest versions of our message broker and all metrics have normalised for the last 1.5 hours.