Alpaca incident

OPRA Message Processing Issue

Alpaca experienced a notice incident on September 15, 2025, lasting 19h 49m. The incident has been resolved; the full update timeline is below.

Update timeline

investigating Sep 15, 2025, 03:57 PM UTC

We experienced an issue with processing OPRA messages between 9:30 AM and 9:42 AM ET.
resolved Sep 16, 2025, 11:46 AM UTC

Issue is already mitigated
postmortem Sep 16, 2025, 11:46 AM UTC

We use aeron transport to send the preprocessed Exegy data from GCP VMs to our Kubernetes cluster. The aeron-driver component which is responsible for the UDP message transport, showed many NAKs \(retransmissions\) and our application showed aeron publication backpressure and hence lower number of processed messages. Sometimes, usually after we restarted the components including aeron-driver, this happens and an aeron-proxy restart helps and will not appear until the next restart. It can run for weeks without a problem. NOTE: We had a scheduled maintenance this weekend. But the situation is weird since aeron-proxy only communicates with aeron-driver using memory mapped files, and restarting it should not cause any NAK related problems, unless there is a bug in aeron-driver itself. We only experience this issue on the production system, so I have a feeling that this is because of the differences between the production and staging cluster network setup \(but this is just a guess\). Unfortunately, we do not have time to debug the situation whenever it happens because we need to restore the service immediately.