PubNub incident

Increased errors and latency in FRA region

PubNub experienced a minor incident on June 27, 2025 affecting European Points of Presence, lasting 47m. The incident has been resolved; the full update timeline is below.

Started: Jun 27, 2025, 09:50 AM UTC
Resolved: Jun 27, 2025, 10:38 AM UTC
Duration: 47m
Detected by Pingoru: Jun 27, 2025, 09:50 AM UTC

Affected components

European Points of Presence

Update timeline

investigating Jun 27, 2025, 09:21 AM UTC

At approximately 08:15 UTC, PubNub services began experiencing elevated latencies and server errors in the Europe region. PubNub Technical Staff is currently investigating, and more updates will follow once available.
investigating Jun 27, 2025, 10:06 AM UTC

The PubNub Technical Staff continues to investigate. More updates will follow once available.
identified Jun 27, 2025, 10:48 AM UTC

The issue has been identified and a fix is being implemented.
monitoring Jun 27, 2025, 10:59 AM UTC

A fix has been implemented and we are monitoring the results.
resolved Jun 27, 2025, 11:53 AM UTC

With no further issues observed, the incident has been resolved. We will follow up soon with a root cause analysis. If you believe you experienced an impact related to this incident, please report it to PubNub Support at [email protected].
postmortem Jul 03, 2025, 06:47 PM UTC

Beginning on Friday, June 27, 2025 at 08:15 UTC, there were occasional, intermittent increases in latency and errors in three of our services: Pub/Sub, History, and Presence. The root cause discussed in this analysis was identified and corrected on Monday, June 30. ### **Problem Description, Impact, and Resolution** Recently, to ensure PubNub had access to more cloud server capacity across our many regions, we introduced new instance types to our system to provide a more heterogeneous set of instance types on which PubNub’s services run. Over time, PubNub has created many OS/kernel-level configurations to optimize the performance of each server. However, with the more heterogeneous instance types, an underlying setting that we were explicitly specifying, which controls limits on network connectivity, was being silently overridden by our upstream load balancers. When we introduced the new instance types, they would reach connectivity limits. Unfortunately, the errors we initially encountered pointed us in incorrect directions, causing the investigation to take longer than we normally strive for. The issue was mitigated once we identified this issue and configured the affected services to run on other instance types and launched more capacity. ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent recurrence, we modified the new instance types to emit metrics related to these OS thresholds and limits, enabling us to detect when these limits are approached or exceeded, regardless of instance type. This change allows us to scale proactively and properly route traffic based on instance type, ensuring we are more dynamic in heterogenous instance type deployment configuration. Again, we apologize for the incidents outlined above and are committed to maintaining transparency when issues affect our customers. Should you have any questions regarding this analysis, please reach out to our support team at [[email protected]](mailto:[email protected]).