PubNub incident

Elevated Event & Action Error In FRA Region

Minor Resolved View vendor source →

PubNub experienced a minor incident on October 6, 2025 affecting European Points of Presence, lasting 46m. The incident has been resolved; the full update timeline is below.

Started
Oct 06, 2025, 10:13 AM UTC
Resolved
Oct 06, 2025, 11:00 AM UTC
Duration
46m
Detected by Pingoru
Oct 06, 2025, 10:13 AM UTC

Affected components

European Points of Presence

Update timeline

  1. investigating Oct 06, 2025, 10:13 AM UTC

    At about 08:40 UTC, the Event & Action publish operation began to experience elevated error rates. Our Technical Staff is actively investigating, and more information will be posted as it becomes available. If you are experiencing issues that you believe are related to this incident, please report the details to PubNub Support ([email protected]).

  2. identified Oct 06, 2025, 10:34 AM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Oct 06, 2025, 10:36 AM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved Oct 06, 2025, 11:00 AM UTC

    With no further issues observed, the incident has been resolved. We will follow up soon with a root cause analysis. If you believe you experienced an impact related to this incident, please report it to PubNub Support at [email protected].

  5. postmortem Oct 06, 2025, 08:37 PM UTC

    ### **Problem Description, Impact, and Resolution** At 08:40 UTC on October 6, 2025, we observed elevated error rates in the Events & Actions service in our EU-Central \(FRA\) region, which led to delays in processing publish-triggered events. Some customers may have experienced slower-than-expected execution of their event workflows during this time. We identified a malformed payload that was causing backend consumers to fail when attempting to process the queue. We deployed an updated build with improved parsing logic, which cleared the blockage and restored normal service. The issue was fully resolved by 11:00 UTC on October 6, 2025. This issue occurred because our event processing service did not correctly handle a malformed message format, which caused the processing queue to stall. Additionally, the alerting system in place was not configured to detect this failure mode promptly, delaying our response. ### **Mitigation Steps and Recommended Future Preventative Measures** To reduce the risk of similar delays in the future, we are refining our alert thresholds and naming conventions to improve early detection and clarity during response. We are also reviewing validation logic to ensure malformed messages are consistently isolated before reaching backend queues.