ACME Technologies experienced a critical incident on August 4, 2025 affecting ACME eCommerce (B2C) and ACME Backoffice (B2B) and 1 more component, lasting 28m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Aug 04, 2025, 08:07 PM UTC
ACME is experiencing a service disruption that is impacting the eCommerce (B2C) application. POS and Backoffice are operational. Our engineering team is investigating the issue and we will update this incident as soon as possible.
- investigating Aug 04, 2025, 08:21 PM UTC
Please note that some POS and Backoffice transaction attempts may also be failing due to this issue. We are continuing to investigate and get a resolution in place as quickly as possible.
- resolved Aug 04, 2025, 08:36 PM UTC
We have resolved an issue causing 500 errors on our checkout calls. All functionality has been restored.
- postmortem Aug 05, 2025, 03:18 PM UTC
**Root Cause** The outage was caused by a configuration issue in a third party fraud protection service which was blocking access to our load balancing servers. **Mitigation Strategy** We have taken measures to quickly alert and detect on such failures and remove such third-party components temporarily from our network to restore services. --- _Here is the RCA and Mitigation strategies provided by the third_-_party fraud protection service on their end:_ **Root Cause** * A configuration issue in our ingress controller managed security group prevented the registration of instances between load balancer and target groups, resulting in 504s. * Traffic slowly ramped down as opposed to stopping immediately, some groups were still receiving traffic. All pods remained healthy during the incident. This caused a delay in our alerting. **Mitigations** * Updating our deployment topology to ensure that issues in the official AWS managed controller can no longer degrade customer environments. * New alerts directly monitoring target groups have been added to notify us before traffic is impacted. * Developing an additional failover in the event load balancers are also failing to register target groups.