SearchStax incident

False Heartbeat Alerts Triggered

Minor Resolved View vendor source →

SearchStax experienced a minor incident on April 25, 2023 affecting Pulse Monitoring, lasting 13d 16h. The incident has been resolved; the full update timeline is below.

Started
Apr 25, 2023, 01:42 AM UTC
Resolved
May 08, 2023, 06:16 PM UTC
Duration
13d 16h
Detected by Pingoru
Apr 25, 2023, 01:42 AM UTC

Affected components

Pulse Monitoring

Update timeline

  1. investigating Apr 25, 2023, 01:42 AM UTC

    We have discovered false Heartbeat alerts for several deployments starting April 23, 11:35 am UTC. Our team is investigating the issue. Our team has separate monitoring that is set up for all deployments for which they get alerted on if a node or deployment is unavailable. At this time, we checked all alerts triggered via Pulse and have confirmed that the deployments for which these alerts were triggered are all in a healthy state.

  2. investigating Apr 25, 2023, 12:04 PM UTC

    At this point, investigations point out to networking issues that could have caused the data to be missed causing the false alerts. We are continuing to investigate the issue.

  3. identified Apr 26, 2023, 10:09 PM UTC

    There were some false heartbeat alerts sent again today April 26, 2023 at 5:35 a.m UTC. Our team has identified one possible contention point that could be causing the issue and is working to remedy it.

  4. monitoring May 03, 2023, 03:30 PM UTC

    A fix was implemented by our team and we are not observing any false heartbeat alerts since April 28, 5:35am UTC. We are going to continue monitoring the performance.

  5. resolved May 08, 2023, 06:16 PM UTC

    Our team continued to monitor Pulse Monitoring through the weekend, and we see that the systems are now performing as expected and there have been no false alerts since the last incident on April 28, 5:35am UTC. This incident has been resolved.