Amplience incident
Unable to Publish or Schedule in Dynamic Content
Amplience experienced a major incident on August 19, 2024 affecting Dynamic Content, lasting 9h 56m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Aug 19, 2024, 08:12 PM UTC
Some Customers are experiencing issues with publishing and scheduling in Dynamic Content. We are currently working to identify the problem
- investigating Aug 19, 2024, 11:21 PM UTC
We have identified that there was a backlog of publishing jobs due to a stuck event listener. It appears that the content is publishing but there is a delay in the published tick showing in Dynamic Content. We expect the publish tick to catch up over the next few hours.
- monitoring Aug 19, 2024, 11:23 PM UTC
We have identified that there was a backlog of publishing jobs due to a stuck event listener. It appears that the content is publishing but there is a delay in the published tick showing in Dynamic Content. We expect the publish tick to catch up over the next few hours.
- resolved Aug 20, 2024, 06:08 AM UTC
The publishing status backlog is now back to the normal level and content items should now be publishing as expected.
- postmortem Aug 21, 2024, 02:25 PM UTC
**Incident Start Date:** 19/08/2024 17:59 GMT **Incident End Date:** 20/08/2024 01:50 GMT **Issue** During the incident, some users were unable to schedule content for publishing. The UI indicated that the publishing process was hanging, preventing users from successfully completing their tasks. **Root Cause** The issue was traced back to an undetected faulty shard within our infrastructure. Any publishing job routed to this shard was not processed, leading to the observed hang in the UI. **Corrective Actions** To prevent this from happening in the future, we have implemented the following measures: * **Additional Monitoring:** We have added new monitors that will alert us if any shard starts to fall behind in processing. This will enable us to detect and address similar issues more quickly. * **Infrastructure Protocols:** We are adding additional protocols to better protect our infrastructure and ensure a more robust and resilient system.