PubNub incident

Delayed Push Notifications in Frankfurt PoP

Notice Resolved View vendor source →

PubNub experienced a notice incident on June 7, 2024 affecting Mobile Push Gateway, lasting 1h 10m. The incident has been resolved; the full update timeline is below.

Started
Jun 07, 2024, 07:25 PM UTC
Resolved
Jun 07, 2024, 08:35 PM UTC
Duration
1h 10m
Detected by Pingoru
Jun 07, 2024, 07:25 PM UTC

Affected components

Mobile Push Gateway

Update timeline

  1. investigating Jun 07, 2024, 07:25 PM UTC

    We have discovered an issue where push notifications were not being delivered in our Frankfurt point-of-presence. Those notifications were then delivered about twenty minutes late. We are investigating the issue.

  2. investigating Jun 07, 2024, 07:49 PM UTC

    We are still experiencing some delayed delivery of push messages in our Frankfurt point-of-presence. We continue to investigate the issue.

  3. monitoring Jun 07, 2024, 08:07 PM UTC

    We have processed all push messages in our backlog and stabilized the system in Frankfurt. We are monitoring the system to ensure no further issues.

  4. resolved Jun 07, 2024, 08:35 PM UTC

    This incident has been resolved, and mobile push notifications are being delivered normally. We will follow up with a root cause analysis.

  5. postmortem Jun 13, 2024, 12:03 AM UTC

    At 18:53 UTC on June 7, 2024, we observed excessively latent deliveries of mobile push messages in our Frankfurt point-of-presence. We discovered that a previously undetected bug was being triggered by malformed messages being sent to the service. We increased the resources available to that service, which allowed the system to catch up and deliveries were being made normally. The issue was declared resolved at 19:59 UTC. ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future we are leaving the system running in the new configuration. We will also increase monitoring for this area, and will be modifying the push notification service to rectify the bug that triggered the scenario originally.