Redox incident

Logs observability is delayed, some processing impacted

Minor Resolved View vendor source →

Redox experienced a minor incident on September 30, 2025 affecting Traffic Processing and Logs (view/search), lasting 1h 28m. The incident has been resolved; the full update timeline is below.

Started
Sep 30, 2025, 04:54 PM UTC
Resolved
Sep 30, 2025, 06:23 PM UTC
Duration
1h 28m
Detected by Pingoru
Sep 30, 2025, 04:54 PM UTC

Affected components

Traffic ProcessingLogs (view/search)

Update timeline

  1. identified Sep 30, 2025, 04:54 PM UTC

    We are currently aware of an issue in which the the last one hour of logs are not properly displaying on the dashboard. Some processing for synchronous traffic may also be delayed. What this means: While asynchronous logs are still processing properly, the dashboard is not displaying them as expected. Search within the dashboard will not work as expected until a fix has been implemented. Additionally, synchronous logs may take longer to process than normal. If you have questions about specific logs or your connection and whether it has been affected, please contact us at [email protected]

  2. monitoring Sep 30, 2025, 06:01 PM UTC

    A fix has been implemented for the log visibility issues and most logs should now be visible. We are actively monitoring the situation and ensuring that all affected customers' logs are visible again. The previously mentioned processing delays have been resolved. If you have questions or are still experiencing issues, please reach out to [email protected]

  3. resolved Sep 30, 2025, 06:23 PM UTC

    Logs should now be visible across the entire Redox platform. If you have any addition questions or are still experiencing issues, please reach out to [email protected]

  4. postmortem Oct 23, 2025, 12:42 AM UTC

    ## **Summary** At approximately 10:39AM CT, September 29 2025, we observed system-wide degradation creating latency for transaction processing and observability. Latency was resolved for most customers by about 13:30PM CT, Sep 29, 2025. At 4:43AM CT, September 30 2025, we observed a similar system degradation. Latency was resolved for customers by 13:23PM CT, September 30 2025. ## **What Happened** * A confluence of several issues generated a number of errors in our system. These factors included a change in the way we manage database partition rotations, an upgrade of a core library, and the interaction of some services with kubernetes. * These errors caused some services to crash, and although our infrastructure is fault-tolerant and can recover from such crashes, the degree of crashing put pressure on our infrastructure in a way that manifested as latency in transaction processing, and a delay in transaction observability. ## **What we are doing about this:** * We rolled back the upgrade made to the affected third-party library. * We are adding and improving alerting on affected services to provide clearer visibility to the teams that are responsible for those services.