MageMojo incident
Emergency maintenance in USEast and Frankfurt.
MageMojo experienced a minor incident on October 15, 2021 affecting Webscale STRATUS - Northern Virginia and Webscale STRATUS - Frankfurt, lasting 3h 12m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 15, 2021, 02:52 PM UTC
We are currently investigating reported issues of 500 and 502s.
- resolved Oct 15, 2021, 06:04 PM UTC
This incident has been resolved.
- postmortem Oct 15, 2021, 06:05 PM UTC
The issue occurred due to a new build of php-fpm containers. This change was only slated for the UAT environment but was inadvertently released to production. The changes were pushed at approximately 05:30 AM EDT, but were not implemented unless a customer’s php-fpm container was restarted so it was not immediately evident there was any problem. This new container build included support for Datadog, which in addition to largely increasing CPU resources, needed also caused random SIGSEGV errors in php-fpm which resulted in 502 http status errors on sites. Once this issue was identified we rebuilt the affected containers and began restarting php-fpm containers that were contained the Datadog release. We will further investigate how these builds were inadvertently released into production instead of UAT.