Kustomer incident
[NATIVE WHATSAPP] Whatsapp messages delayed sending [PROD1]
Kustomer experienced a minor incident on June 6, 2025 affecting Channel - WhatsApp, lasting 3h 20m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jun 06, 2025, 05:41 PM UTC
Kustomer is aware of an event affecting Native Whatsapp that may cause outbound messages to delay sending Our team is currently working to identify the cause of this issue in an effort to implement a resolution. Please expect additional updates within the next 30 minutes, please reach out to Kustomer Support at [email protected] for any further questions or updates.
- investigating Jun 06, 2025, 05:42 PM UTC
Kustomer is aware of an event affecting Native Whatsapp that may cause outbound messages to delay sending Our team is currently working to identify the cause of this issue in an effort to implement a resolution. Please expect additional updates within the next 30 minutes, please reach out to Kustomer Support at [email protected] for any further questions or updates.
- investigating Jun 06, 2025, 06:05 PM UTC
Kustomer is aware of an event affecting Native Whatsapp that may cause outbound messages to delay sending Our team is continuing to work to identify the cause of this issue in an effort to implement a resolution. Please expect additional updates within the next 30 minutes, please reach out to Kustomer Support at [email protected] for any further questions or updates.
- investigating Jun 06, 2025, 06:34 PM UTC
Kustomer is aware of an event affecting Native Whatsapp that may cause outbound messages to delay sending Our team is still working to identify the cause of this issue in an effort to implement a resolution. Please expect additional updates within the next 30 minutes, please reach out to Kustomer Support at [email protected] for any further questions or updates.
- identified Jun 06, 2025, 07:04 PM UTC
Kustomer has identified an event affecting Whatsapp (PROD1) that may cause delayed sending Our team is currently working to implement a resolution. Please expect further updates within the next 30 minutes, and reach out to Kustomer support at [email protected] if you have additional questions or concerns.
- identified Jun 06, 2025, 07:32 PM UTC
Kustomer has identified an event affecting Whatsapp (PROD1) that may cause delayed sending Our team is continuing to work on implementing a resolution. Please expect further updates within the next 30 minutes, and reach out to Kustomer support at [email protected] if you have additional questions or concerns.
- identified Jun 06, 2025, 08:01 PM UTC
Kustomer has identified an event affecting Whatsapp (PROD1) that may cause delayed sending Our team continues to work on implementing a resolution. Please expect further updates within the next 30 minutes, and reach out to Kustomer support at [email protected] if you have additional questions or concerns.
- monitoring Jun 06, 2025, 08:10 PM UTC
Kustomer has implemented an update to address an event affecting Native Whatsapp that may have caused delays with delivery of outbound messages. Our team is currently monitoring this update to ensure the issue is fully resolved. Please expect further updates within the next 30 minutes, and reach out to Kustomer support at [email protected] if you have additional questions or concerns.
- resolved Jun 06, 2025, 09:01 PM UTC
Kustomer has resolved an event affecting Native WhatsApp Channel in PROD 1 that caused outbound messages to delay in sending. After careful monitoring, our team has determined that all affected areas are now fully restored. Please reach out to Kustomer support at Chat or Email if you have additional questions or concerns.
- postmortem Jun 13, 2025, 07:37 PM UTC
## **Summary** On Monday, June 6th, drafts for Whatsapp experienced a significant delay in sending. This led to an incident where messages were not delivered before attached media items expired. ## **Root Cause** Our WhatsApp service experienced an overload due to a significant surge in WhatsApp messages. The service was scaling, but it couldn't keep pace with the sudden demand, resulting in elevated latency and timeouts. This, in turn, initiated retries within our service, intensifying the message load and, occasionally, generating duplicate messages. In some cases, the service timed out, but the draft creation was still successful - which caused the same messages to be retried and led to duplicate messages. Consequently, both the Drafts service and WhatsApp service on prod1 experienced considerable spikes in memory and CPU usage. In addition, WhatsApp was returning errors about media items in some of the messages. This was due to the increased latency - the media item in some messages had expired before the message could be sent. Which also caused some additional retries and exacerbated the issue. ## **Timeline** **Jun 6, 2025** **1:33 PM EST** Incident created. **1:41 PM EST** Began investigating recent releases in WhatsApp and other related services. **1:53 PM EST** Discovered spikes in WhatsApp service, not code change related. **4:03 PM EST** Deployed scaling changes to WhatsApp service, spikes settled down. **4:07 PM EST** Created a change to reduce the rate limit in Drafts service for WhatsApp. **7:59 PM EST** Deployed rate limit change; traffic returned to healthy levels. ## **Lessons/Improvements** * **Duplicate Drafts Investigation** - Understand why duplicate WhatsApp drafts occurred during the incident. * Status: Done * **Scaling Enhancements** - Increased scaling for WhatsApp service to better handle message bursts. * Status: Done * **Adjusted Rate Limit** - Decreased WhatsApp rate limit from 400/minute to 300/minute in Drafts service. * Status: Done * **Media Expiration** - Investigate expiration on media items and determine if it can be extended beyond that. * Status: In Progress