Alpaca incident

High DB connection usage creating intermittent API failure

Critical Resolved View vendor source →

Alpaca experienced a critical incident on October 7, 2025 affecting Account API and Transfers and 1 more component, lasting 13h 52m. The incident has been resolved; the full update timeline is below.

Started
Oct 07, 2025, 11:05 PM UTC
Resolved
Oct 08, 2025, 12:57 PM UTC
Duration
13h 52m
Detected by Pingoru
Oct 07, 2025, 11:05 PM UTC

Affected components

Account APITransfersOrders APIPositions APIDashboardAssets APITrade Update Streaming

Update timeline

  1. investigating Oct 07, 2025, 11:05 PM UTC

    We are seeing high DB usage causing the API to fail. We are checking internally.

  2. investigating Oct 07, 2025, 11:20 PM UTC

    We are still working on identifying the underlying cause of DB connections

  3. investigating Oct 07, 2025, 11:22 PM UTC

    We are continuing to investigate this issue.

  4. investigating Oct 07, 2025, 11:26 PM UTC

    We are restarting the DB and API calls are expected to fail.

  5. investigating Oct 07, 2025, 11:42 PM UTC

    Team is still working on the recovery.

  6. investigating Oct 07, 2025, 11:54 PM UTC

    Database is restarted and team is monitoring it

  7. investigating Oct 08, 2025, 12:27 AM UTC

    After restarting, the database connections are under control. We are monitoring the system.

  8. investigating Oct 08, 2025, 01:07 AM UTC

    We are seeing spike in connections again. Teams are actively working on it

  9. investigating Oct 08, 2025, 01:28 AM UTC

    System is working as expected. We are monitoring the performance

  10. monitoring Oct 08, 2025, 01:37 AM UTC

    We will continue to monitor the system and take appropriate the action.

  11. monitoring Oct 08, 2025, 01:37 AM UTC

    We are continuing to monitor for any further issues.

  12. monitoring Oct 08, 2025, 01:45 AM UTC

    We have identified and resolved the issue that was affecting our systems. Since implementing the fix, all systems have been operating normally with no recurrence of the problem. We have been closely monitoring our systems over the past hour, and all indicators show stable performance. Our monitoring infrastructure continues to track system health to detect any potential issues early. Our engineering team remains available to respond immediately if any concerns arise.

  13. resolved Oct 08, 2025, 12:57 PM UTC

    This incident has been resolved.

  14. postmortem Oct 08, 2025, 02:01 PM UTC

    High usage of database connections was identified around 6:00 PM ET. The team was able to identify an application process that was creating contention due to a lock wait, which persisted even after the request timed out. Both the application and the database were restarted to recover. The issue persisted intermittently but eventually resolved.