Togetherwork incident
[SEV-1] Togetherpay - Production Outage
Togetherwork experienced a critical incident on September 3, 2024 affecting Transaction Processing and Payment Tokenization and 1 more component, lasting 7h 36m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Sep 03, 2024, 10:22 AM UTC
Today's production roll did not go according to plan and processing in production is down. We are working as quickly as possible to resolve the issue. Additional information will be provided when known, or when the situation is resolved.
- identified Sep 03, 2024, 11:14 AM UTC
Production is still down and teams are actively working to restore it. The issue has been identified and a fix is being implemented. We will provide another update when more is know or services are completely restored.
- identified Sep 03, 2024, 11:56 AM UTC
Transactions are processing normally. Teams are still working to fully fix the issue. We will provide another update when more is known or the issue is fully resolved.
- resolved Sep 03, 2024, 05:59 PM UTC
The Togetherpay production release was successfully rolled back. All systems are fully functioning as they were prior to this morning's roll. This incident is resolved.
- postmortem Sep 06, 2024, 07:16 PM UTC
Togetherwork identified the root cause of the failed 9/3 production deployment. It was primarily caused by: 1. GitLab downtime - initial delay in the deployment was due to GitLab being down, which also caused subsequent slowness in the pipeline 2. Database migration issues - new column migrations were not applied correctly, leading to application errors and failure in displaying merchants 3. Incomplete Rollback - the rollback did not fully restore the previous state, causing further site downtime. Corrective actions that are being implemented include: 1. Improved GitLab monitoring 2. Database migration testing 3. Improved rollback procedures 4. Pipeline optimization The 9/3 incident was resolved by fully reverting to a previous, stable branch. Between 1:00 a.m.-1:37 p.m. eastern, Togetherwork Products could have experienced intermittent processing issues. Two windows were identified as complete payment processing outages: 6:04 a.m. - 7:45 a.m. eastern 12:03 p.m. - 1:37 p.m. eastern The re-deploy is scheduled for Wednesday, 9/11 between 1:00 a.m. - 4:00 a.m. eastern.