Ryft Pay incident

Elevated API errors

Ryft Pay experienced a major incident on November 24, 2025 affecting Payments API, lasting 15m. The incident has been resolved; the full update timeline is below.

Started: Nov 24, 2025, 05:40 PM UTC
Resolved: Nov 24, 2025, 05:55 PM UTC
Duration: 15m
Detected by Pingoru: Nov 24, 2025, 05:40 PM UTC

Affected components

Payments API

Update timeline

investigating Nov 24, 2025, 05:51 PM UTC

We are currently investigating this issue.
identified Nov 24, 2025, 05:51 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Nov 24, 2025, 05:51 PM UTC

The elevated error rates lasting approx 20 minutes have now been resolved. We apologise for any inconvenience caused
resolved Nov 24, 2025, 05:55 PM UTC

The incident has now been resolved.
postmortem Nov 25, 2025, 09:30 AM UTC

**Summary** The root cause of the incident was due to a faulty configuration update during a deployment. This lead to a period of time whereby the deployment partially served traffic prior to being classified as unhealthy. The impacted API resources were as follows: * `v1/payment-sessions` **Timeline** The erroneous deployment went live at 5:21pm UTC. Live traffic was switched to the new instances at 5:23pm. On-site developers noticed elevated errors originating from the new nodes at 5:25pm and initiated a rollback at 5:30pm. The rollback was completed at 5:49pm and saw an instant reduction of the errors introduced by the previous deployment. The total impact time was approx 25 minutes. **What are we doing about it?** * Developers have introduced additional measures to detect faulty configuration updates. These steps will prevent bad configuration being deployable going forward. * Improvements to our rollback policies will ensure a more timely rollback in the future * The team will make adjustments to our rolling deployments whereby live traffic will be served for a longer period of time prior to being switched over to the latest deployed instances. This gives a larger window of time in which bad updates can be detected and averted before impacting our customers.