PubNub incident

Subscribers experiencing errors in all regions

Notice Resolved View vendor source →

PubNub experienced a notice incident on May 18, 2023 affecting Publish/Subscribe Service and North America Points of Presence and 1 more component, lasting 1h 36m. The incident has been resolved; the full update timeline is below.

Started
May 18, 2023, 04:48 PM UTC
Resolved
May 18, 2023, 06:24 PM UTC
Duration
1h 36m
Detected by Pingoru
May 18, 2023, 04:48 PM UTC

Affected components

Publish/Subscribe ServiceNorth America Points of PresenceEuropean Points of PresenceAsia Pacific Points of PresenceSouthern Asia Points of Presence

Update timeline

  1. investigating May 18, 2023, 04:48 PM UTC

    Subscribers were experiencing sporadic errors on May 18 between 2:35 PM and 4:11 PM UTC. We are investigating the cause.

  2. identified May 18, 2023, 05:17 PM UTC

    We have identified the issue, and are still investigating further. All systems are operational.

  3. monitoring May 18, 2023, 05:38 PM UTC

    We have been operating with all systems normal for more than an hour. We are monitoring the situation at this point and investigating the root cause.

  4. resolved May 18, 2023, 06:24 PM UTC

    We are resolving this issue, and we will follow up with a post-mortem soon. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support ([email protected]) if you wish to discuss the impact on your service.

  5. postmortem May 23, 2023, 03:20 PM UTC

    ### **Problem Description, Impact, and Resolution** At 14:35 UTC on May 18, 2023 we observed some errors being served to subscribers globally. We noted a large, unusual traffic pattern that was putting memory pressure on parts of our infrastructure faster than our normal autoscaling could handle. We resolved the issue by manually adding capacity to cover the newly observed pattern. The issue was resolved at 16:15 UTC the same day. This issue occurred because the system was not prepared to scale quickly enough on the combination of factors that were unique to this traffic. ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future we are adding new monitoring and alerting that can detect this scenario, as well as tuning scaling factors in our systems to allow our autoscaling to react more appropriately to it.