PubNub experienced a notice incident on May 18, 2023 affecting Publish/Subscribe Service and North America Points of Presence and 1 more component, lasting 1h 36m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating May 18, 2023, 04:48 PM UTC
Subscribers were experiencing sporadic errors on May 18 between 2:35 PM and 4:11 PM UTC. We are investigating the cause.
- identified May 18, 2023, 05:17 PM UTC
We have identified the issue, and are still investigating further. All systems are operational.
- monitoring May 18, 2023, 05:38 PM UTC
We have been operating with all systems normal for more than an hour. We are monitoring the situation at this point and investigating the root cause.
- resolved May 18, 2023, 06:24 PM UTC
We are resolving this issue, and we will follow up with a post-mortem soon. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support ([email protected]) if you wish to discuss the impact on your service.
- postmortem May 23, 2023, 03:20 PM UTC
### **Problem Description, Impact, and Resolution** At 14:35 UTC on May 18, 2023 we observed some errors being served to subscribers globally. We noted a large, unusual traffic pattern that was putting memory pressure on parts of our infrastructure faster than our normal autoscaling could handle. We resolved the issue by manually adding capacity to cover the newly observed pattern. The issue was resolved at 16:15 UTC the same day. This issue occurred because the system was not prepared to scale quickly enough on the combination of factors that were unique to this traffic. ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future we are adding new monitoring and alerting that can detect this scenario, as well as tuning scaling factors in our systems to allow our autoscaling to react more appropriately to it.