PubNub incident

Elevated History Error/Latency in Tokyo Region

Notice Resolved View vendor source →

PubNub experienced a notice incident on March 17, 2024 affecting Storage and Playback Service, lasting 41m. The incident has been resolved; the full update timeline is below.

Started
Mar 17, 2024, 12:43 AM UTC
Resolved
Mar 17, 2024, 01:24 AM UTC
Duration
41m
Detected by Pingoru
Mar 17, 2024, 12:43 AM UTC

Affected components

Storage and Playback Service

Update timeline

  1. investigating Mar 17, 2024, 12:43 AM UTC

    Around 00:06 UTC we began to notice increasing errors and latency for History in Tokyo region. We are investigating this incident.

  2. monitoring Mar 17, 2024, 01:06 AM UTC

    The elevated state of History errors and latency has returned to normal. We will continue to monitor the incident

  3. monitoring Mar 17, 2024, 01:24 AM UTC

    We have not seen any errors or increased latency for ~45 minutes. We will continue to monitor history to validate the resolution.

  4. resolved Mar 17, 2024, 01:24 AM UTC

    This incident has been resolved.

  5. postmortem Mar 20, 2024, 06:27 PM UTC

    ### **Problem Description, Impact, and Resolution** At 00:06 UTC on March 17, 2024, we observed increased error rates and latency in our Tokyo region for History calls. We then identified the source of latency and errors were due to our third-party provider for storage. We alerted the third-party provider, which then restarted the impacted storage nodes, and the issue was resolved at t 00:47 UTC on March 17, 2024. ‌ ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future we have added monitoring to the swap space level on our servers so we will have better alerting if such issues with our third-party provider occur in the future.