PubNub incident

Subscribe is experiencing elevated latencies in the EU PoP

Notice Resolved View vendor source →

PubNub experienced a notice incident on March 22, 2023 affecting Publish/Subscribe Service, lasting 28m. The incident has been resolved; the full update timeline is below.

Started
Mar 22, 2023, 03:15 PM UTC
Resolved
Mar 22, 2023, 03:43 PM UTC
Duration
28m
Detected by Pingoru
Mar 22, 2023, 03:15 PM UTC

Affected components

Publish/Subscribe Service

Update timeline

  1. investigating Mar 22, 2023, 03:15 PM UTC

    Customers using Subscribe service in our EU Central PoP may be experiencing errors, timeouts, and delays in message delivery

  2. monitoring Mar 22, 2023, 03:22 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. monitoring Mar 22, 2023, 03:23 PM UTC

    We are continuing to monitor for any further issues.

  4. resolved Mar 22, 2023, 03:43 PM UTC

    Between 12:25 - 14:45 UTC, Customers using Subscribe service in our EU Central PoP may have experienced some errors, timeouts, and delays in message delivery.

  5. postmortem Mar 27, 2023, 06:15 PM UTC

    ### **Problem Description, Impact, and Resolution** On Tuesday, March 21, 2023 at 09:08 UTC, we observed errors, timeouts, and delays in message delivery in our EU Central PoP. We rolled back the responsible configuration changes, and the issue was resolved at 09:38 UTC. This issue occurred due to a configuration change that allowed our subscribe service to use the existing resources better. Unfortunately, this caused us to hit a limit in open connection counts, leading to delays in creating new connections. This, in turn, led to delayed subscribe call connections and message delivery. This was the same issue that occurred on March 16th. Unfortunately, it is particularly difficult to measure the customer impact of the subscribe API. ‌ ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future, there is a metric we can use to approximate customer impact that we will monitor closely going forward, including during any further configuration changes.