Redox incident

Log processing is delayed

Minor Resolved View vendor source →

Redox experienced a minor incident on February 26, 2026 affecting Traffic Processing, lasting 14m. The incident has been resolved; the full update timeline is below.

Started
Feb 26, 2026, 05:33 PM UTC
Resolved
Feb 26, 2026, 05:47 PM UTC
Duration
14m
Detected by Pingoru
Feb 26, 2026, 05:33 PM UTC

Affected components

Traffic Processing

Update timeline

  1. identified Feb 26, 2026, 05:33 PM UTC

    We are aware of an issue with our message processing. What this means: Messages are presently not processing as expected. Logs are taking longer to process than expected. Please expect delays in message receipt until we have identified the issue and implemented a fix. What we will do: After a fix has been implemented, Redox will go through affected messages and determine what action can be taken to rectify the situation If you have any additional questions please reach out to us at [email protected]

  2. monitoring Feb 26, 2026, 05:39 PM UTC

    A fix has been implemented for the traffic processing issue and we are monitoring the results. Message processing latency is catching up. If you have any questions, please contact us at [email protected]

  3. resolved Feb 26, 2026, 05:47 PM UTC

    Messages are now processing normally. If you have any additional questions or are not seeing expected logs, please reach out to [email protected]

  4. postmortem Mar 10, 2026, 10:56 PM UTC

    **What Happened** On **February 26, 2026** at approximately **10:16 AM CT**, our production environment experienced elevated latency across all asynchronous traffic processing. The incident was triggered when core processing workers were unintentionally scaled to a minimal scale, below our normal expected scale to process day to day traffic. This significantly reduced our processing capacity. The impact varied across connections, with durations ranging from 20-42 minutes and maximum average latencies between 9.6 and 18.2 minutes. **How We Fixed It** Our team quickly identified the root cause and executed a script to scale all affected workers back to their proper scale. By **10:45 AM CT**, all async processing had returned to nominal latency levels.