Iterable incident

API failures across multiple clusters and degraded webApp performance

Iterable experienced a notice incident on July 19, 2024 affecting Email Sends and Email Sends and 1 more component, lasting 3h 33m. The incident has been resolved; the full update timeline is below.

Started: Jul 19, 2024, 03:37 PM UTC
Resolved: Jul 19, 2024, 07:10 PM UTC
Duration: 3h 33m
Detected by Pingoru: Jul 19, 2024, 03:37 PM UTC

Affected components

Email SendsEmail SendsEmail SendsEmail SendsEmail SendsEmail SendsEmail SendsEmail SendsEmail SendsEmail Sends

Update timeline

investigating Jul 19, 2024, 03:37 PM UTC

Beginning around 6:20 AM PST we were alerted to a spike in API errors across multiple endpoints impacting a number of specific customer clusters. All clusters numbered 100+ may be experiencing a spike in 5xx API errors across all endpoints. This may impact areas of the app such as scheduled and triggered Journeys, scheduled and triggered campaign sends, custom events, user updates, and more. Customers may also be experiencing webApp performance degradation as well including segmentation, list uploads, and viewing campaign details. Our engineering team is continuing work on identifying the underlying cause and exploring remediation options. Next update will be at 9 AM PST or sooner. If you have questions please reach out to [email protected]
investigating Jul 19, 2024, 03:43 PM UTC

We are continuing to investigate this issue.
identified Jul 19, 2024, 04:16 PM UTC

Our Engineers have identified the root cause of the issue and are actively working on deploying a fix. Currently our API endpoints have recovered, but in the meantime, customers may still be experiencing slowness in scheduled and triggered Journeys, scheduled and triggered campaign sends, custom events, user updates, and more. Customers may also be experiencing webApp performance degradation as well including segmentation, list uploads, and viewing campaign details. While the fix is being deployed, to clarify, this issue is specifically impacting All clusters numbered 100+. Our next update will be at 10:00 AM PT or sooner.
monitoring Jul 19, 2024, 05:10 PM UTC

Web app and API endpoints have completely recovered at this point. However, there are still a subset of customers that may be experiencing an ingestion lag that is currently draining. These customers may still be seeing delays in user updates, event calls, and event triggered journeys. We are continuing to monitor this and will provide our next update at 11 AM PT or sooner.
monitoring Jul 19, 2024, 05:38 PM UTC

We are continuing to monitor for any further issues.
monitoring Jul 19, 2024, 06:00 PM UTC

As of now we have completely caught up on ingestion lag with all services returning to normal. We will continue to monitor performance with that next update at 12 PM PT or sooner.
resolved Jul 19, 2024, 07:10 PM UTC

We have fully recovered from this incident and are marking it resolved. If you have any further questions please reach out to [email protected]