Rebuy incident
Elevated 503 Errors Causing Rebuy Services Unavailable
Rebuy experienced a major incident on March 25, 2025 affecting Smart Cart and Checkout Extensions Widgets and 1 more component, lasting 1h 16m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Mar 25, 2025, 03:32 AM UTC
We are currently experiencing a high volume of 503 errors, which is impacting service availability. Our team is actively investigating and working to resolve the issue as quickly as possible. Thank you for your patience.
- identified Mar 25, 2025, 03:43 AM UTC
We have identified a potential root cause of the issue and are taking the necessary actions to resolve it.
- identified Mar 25, 2025, 04:04 AM UTC
We are continuing to work on resolving the issue. Thank you for your patience as we address this.
- identified Mar 25, 2025, 04:34 AM UTC
A fix has been implemented, and we are beginning to see a decrease in errors. More updates to follow.
- monitoring Mar 25, 2025, 04:35 AM UTC
A fix has been implemented and we are monitoring the results.
- monitoring Mar 25, 2025, 04:43 AM UTC
We are continuing to monitor for any further issues.
- resolved Mar 25, 2025, 04:49 AM UTC
The issue has been resolved. Our team will prepare a formal Root Cause Analysis and post it on the status page incident. Thank you for your patience throughout this process.
- postmortem Mar 25, 2025, 06:12 PM UTC
**Issue:** Customers experienced 503 Service Unavailable errors, impacting On-site functionality. The issue was identified during routine alert monitoring and in response to reports of degraded service. **Root Cause:** The issue stemmed from the Ingress configuration not being properly applied after an update to the NGINX controller. Specifically, the upgrade to NGINX version 1.12.1 as part of a critical security vulnerability update led to a misapplication of the configuration. The security evaluation process on the configuration snippets caused delays in loading, resulting in traffic being directed to default configurations, which led to service disruptions. **Actions Taken:** * **Stage Environment Rollback:** To resolve the issue, the rollback plan was initiated, downgrading from NGINX version 1.12.1 to 1.12.0 in the staging environment. While this did not immediately restore full functionality, it was a key first step in addressing the issue. * **Configuration Investigation and Fix:** Further investigation revealed that the Ingress configurations were not properly loaded due to the security evaluation process and an additional contributing factor. Once identified, configuration changes were made to resolve the snippet evaluation issue. The controller configurations were purged and redeployed, correcting the “default” routing behavior and restoring service. * **Production Environment Fix:** This same resolution was applied to the production environment \([Rebuyengine.com](http://Rebuyengine.com)\), which was fully restored to normal operation. **Next Steps:** All necessary actions to address the vulnerability have been completed, and no further steps are required at this time. Moving forward, we’ve enhanced our testing process to ensure that if this error is encountered, we can take appropriate actions prior to any impact to the Rebuy services.