PubNub incident

Elevated latencies and errors for multiple services in US-west and US East

PubNub experienced a minor incident on October 20, 2025 affecting Publish/Subscribe Service and North America Points of Presence and 1 more component, lasting 1h 47m. The incident has been resolved; the full update timeline is below.

Started: Oct 20, 2025, 07:32 AM UTC
Resolved: Oct 20, 2025, 09:19 AM UTC
Duration: 1h 47m
Detected by Pingoru: Oct 20, 2025, 07:32 AM UTC

Affected components

Publish/Subscribe ServiceNorth America Points of PresenceStorage and Playback ServiceStream Controller ServicePresence ServiceAccess Manager ServiceDNS ServiceMobile Push GatewayApp Context Service

Update timeline

investigating Oct 20, 2025, 07:32 AM UTC

At approximately 07:02 UTC, PubNub services began experiencing elevated latencies and server errors in the US-West and US-East regions. PubNub Technical Staff is currently investigating, and more updates will follow once available.
identified Oct 20, 2025, 07:58 AM UTC

The issue has been identified and a fix is being implemented.
monitoring Oct 20, 2025, 08:45 AM UTC

A fix has been implemented and we are monitoring the results.
resolved Oct 20, 2025, 09:19 AM UTC

With no further issues observed for the past 30 minutes, the incident has been resolved. We will follow up soon with a root cause analysis. If you believe you experienced an impact related to this incident, please report it to PubNub Support at [email protected].
postmortem Oct 27, 2025, 06:21 PM UTC

### **Problem Description, Impact, and Resolution** On October 20th, 2025 at 07:06 UTC, our monitoring systems alerted us to elevated error levels across multiple PubNub services in the IAD region \(US-East\). Some customers may have experienced increased error rates and latency, as well as intermittent issues with Presence service availability across IAD \(US-East\), SJC \(US-West\), and HND \(AP-Northeast\). We quickly determined the issue was caused by a broader infrastructure outage affecting our cloud provider \(AWS\) in the IAD region. We initiated regional failover procedures and re-routed new connections to alternate regions. However, due to undefined steps in some of our failover processes and delays accessing some tools due to the provider issue, existing connections for some services remained degraded for longer than expected. To restore full service, we manually reset established connections, re-routed Presence traffic to Frankfurt \(EU-Central\), and brought on additional infrastructure in other regions to absorb traffic. Errors were mitigated by 09:20 UTC. Later in the day, additional regional load in US-West triggered a new wave of service degradation. We responded by isolating the US-East region again and scaling up balancer capacity in US-West. PubNub services were stabilized by 13:20 UTC, and remained in a monitoring state while our infrastructure provider worked to fully resolve the underlying issue. By 22:35 UTC, our provider reported full restoration of service. After validating stability in US-East, we completed rebalancing traffic by 23:48 UTC, and declared the incident resolved. ### **Mitigation Steps and Recommended Future Preventative Measures** While this incident was caused by an external infrastructure outage, we’ve identified several opportunities to strengthen our internal readiness and response procedures. We are consolidating and centralizing our regional failover procedures to ensure they are immediately accessible and complete for all production services. Any gaps in our process documentation for newer services will be addressed to ensure readiness before they are fully adopted into production. Additionally, we are reviewing and resolving issues with internal tooling, including inventory and DNS resolution problems, which made mitigation more difficult during the incident. These improvements will ensure faster and more consistent responses to future infrastructure-level disruptions, and reduce potential impact on customer traffic across regions.