PubNub incident

Failures for Presence Webhooks

Minor Resolved View vendor source →

PubNub experienced a minor incident on March 2, 2024 affecting North America Points of Presence and European Points of Presence and 1 more component, lasting 1h 13m. The incident has been resolved; the full update timeline is below.

Started
Mar 02, 2024, 01:26 AM UTC
Resolved
Mar 02, 2024, 02:39 AM UTC
Duration
1h 13m
Detected by Pingoru
Mar 02, 2024, 01:26 AM UTC

Affected components

North America Points of PresenceEuropean Points of PresenceAsia Pacific Points of PresencePresence ServiceSouthern Asia Points of Presence

Update timeline

  1. investigating Mar 02, 2024, 01:26 AM UTC

    `Around 14:45 UTC (06:45 PT) March 01, the Presence service began to experience missed Presence webhook calls for some users. PubNub Technical Staff is investigating and more information will be posted as it becomes available. If you are experiencing issues that you believe to be related to this incident, please report the details to PubNub Support ([email protected]).

  2. identified Mar 02, 2024, 01:37 AM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Mar 02, 2024, 02:08 AM UTC

    A fix was implemented at 01:45 UTC. We are monitoring the results for the next 30 minutes.

  4. resolved Mar 02, 2024, 02:39 AM UTC

    This incident has been resolved with no errors observed for the last 30 minutes. We apologize for any impact this may have had on your service. Don't hesitate to contact us by reaching PubNub Support ([email protected]) if you wish to discuss the impact on your service. An RCA will be provided soon.

  5. postmortem Mar 08, 2024, 02:11 PM UTC

    ### **Problem Description, Impact, and Resolution** Around 14:45 UTC, March 1, 2024, we observed the Presence service started experiencing missed Presence webhook calls. The missed webhook calls were due to a migration of the Presence webhooks across multiple customers. We rolled back the migration for all customers in case the issue was broader and created a status page for transparency. Further investigation showed the missed webhook calls were isolated to a small subset of customers. ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring, we added and deployed redirect functionality in our Events & Actions service and added monitoring for future webhook migrations.