iAdvize incident
P1 - Conversation notifications might not be visible on websites
iAdvize experienced a major incident on September 2, 2024 affecting Chat and Call and 1 more component, lasting 1h 55m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Sep 02, 2024, 10:06 AM UTC
Please know that we are facing an issue: The notification might not appear on your Website. We are working on it.
- investigating Sep 02, 2024, 10:08 AM UTC
We are continuing to investigate this issue.
- identified Sep 02, 2024, 10:26 AM UTC
We are still on it. We are performing actions to resolve the issue.
- monitoring Sep 02, 2024, 11:07 AM UTC
Please know that a fix is live. You should be able to see notifications on your Website again. We are monitoring this.
- resolved Sep 02, 2024, 12:02 PM UTC
This incident has been resolved.
- postmortem Sep 06, 2024, 12:31 PM UTC
**Incident:** On September 2nd, between 11:01 CEST and 13:04 CEST, we experienced an incident impacting the service in charge of the iAdvize engagement \(handling targeting\). During this timeframe, the display of notifications on our customers' websites and on mobile applications fluctuated between functioning randomly and not being displayed at all. As a result, starting a conversation from Chat / Call / Video / mobile application channels was degraded \(86 min\) or even completely cut off \(37 min\). Social channels were not impacted. This unavailability of our engagement service occurred because: * After a restart, our mirroring service moved to the same server instance as our engagement service * Due to an unexpected resource usage spike on the mirroring service, the engagement service was left with insufficient resources to scale and run properly **Resolution** To solve this issue, we manually isolated our mirroring service to different server instance, ensuring the engagement service had enough resources to run properly again. **Actions for the future** * \(Done\) Isolate our mirroring service away from other critical services * \(Done\) Analyze the causes of the resource increase on our mirroring service, and implement optimizations to reduce its resource usage * \(Done\) Improve alerting alerting in case of network resource issue on server instances