Alpaca incident
High DB connection usage creating intermittent API failure
Alpaca experienced a critical incident on October 7, 2025 affecting Account API and Transfers and 1 more component, lasting 13h 52m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 07, 2025, 11:05 PM UTC
We are seeing high DB usage causing the API to fail. We are checking internally.
- investigating Oct 07, 2025, 11:20 PM UTC
We are still working on identifying the underlying cause of DB connections
- investigating Oct 07, 2025, 11:22 PM UTC
We are continuing to investigate this issue.
- investigating Oct 07, 2025, 11:26 PM UTC
We are restarting the DB and API calls are expected to fail.
- investigating Oct 07, 2025, 11:42 PM UTC
Team is still working on the recovery.
- investigating Oct 07, 2025, 11:54 PM UTC
Database is restarted and team is monitoring it
- investigating Oct 08, 2025, 12:27 AM UTC
After restarting, the database connections are under control. We are monitoring the system.
- investigating Oct 08, 2025, 01:07 AM UTC
We are seeing spike in connections again. Teams are actively working on it
- investigating Oct 08, 2025, 01:28 AM UTC
System is working as expected. We are monitoring the performance
- monitoring Oct 08, 2025, 01:37 AM UTC
We will continue to monitor the system and take appropriate the action.
- monitoring Oct 08, 2025, 01:37 AM UTC
We are continuing to monitor for any further issues.
- monitoring Oct 08, 2025, 01:45 AM UTC
We have identified and resolved the issue that was affecting our systems. Since implementing the fix, all systems have been operating normally with no recurrence of the problem. We have been closely monitoring our systems over the past hour, and all indicators show stable performance. Our monitoring infrastructure continues to track system health to detect any potential issues early. Our engineering team remains available to respond immediately if any concerns arise.
- resolved Oct 08, 2025, 12:57 PM UTC
This incident has been resolved.
- postmortem Oct 08, 2025, 02:01 PM UTC
High usage of database connections was identified around 6:00 PM ET. The team was able to identify an application process that was creating contention due to a lock wait, which persisted even after the request timed out. Both the application and the database were restarted to recover. The issue persisted intermittently but eventually resolved.