Salt Edge incident

Partial unavailability of the API

Major Resolved View vendor source →

Salt Edge experienced a major incident on March 31, 2025 affecting Account Information API, lasting 1d 3h. The incident has been resolved; the full update timeline is below.

Started
Mar 31, 2025, 02:43 PM UTC
Resolved
Apr 01, 2025, 05:52 PM UTC
Duration
1d 3h
Detected by Pingoru
Mar 31, 2025, 02:43 PM UTC

Affected components

Account Information API

Update timeline

  1. identified Mar 31, 2025, 02:43 PM UTC

    We experienced a temporary issue affecting the availability of our services, and some customers may have encountered HTTP 500 Internal Server Errors, starting with 09:30 UTC. The team is working on investigating the problem. We apologize for any inconvenience this may have caused and appreciate your understanding.

  2. identified Mar 31, 2025, 08:49 PM UTC

    Update. Work in progress - The team is actively working on fixing the issue. At this time, the unavailability of the API is affecting less than 0.01% of the traffic. We apologize for any inconvenience this may have caused and appreciate your understanding.

  3. resolved Apr 01, 2025, 05:52 PM UTC

    We want to inform you that the issue affecting the availability of the API service was identified and resolved at 13:00 UTC on April 1st. Our team took the necessary actions to mitigate the impact and restore full functionality. All systems are now operating normally, and we continue monitoring the platform closely to ensure stability. We appreciate your patience and understanding.

  4. postmortem Apr 08, 2025, 12:08 PM UTC

    **Incident summary**: between March 31, 2025, 09:30 UTC and April 1, 2025, 13:00 UTC, Salt Edge experienced a partial outage affecting the Account Information API and Payments Initiation API. **Root cause**: due to high I/O load on the source database server, the AIS & PIS APIs experienced performance degradation. This led to slower operations and delayed responses for Users. **Resolution**: the server role was switched to a healthy replica, and the problematic server was removed from Production and replaced with new hardware. **Action Points**: * Implement I/O performance alerting * Plan and gradually implement high availability for the PostgreSQL cluster