SearchStax experienced a minor incident on April 25, 2023 affecting Pulse Monitoring, lasting 13d 16h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Apr 25, 2023, 01:42 AM UTC
We have discovered false Heartbeat alerts for several deployments starting April 23, 11:35 am UTC. Our team is investigating the issue. Our team has separate monitoring that is set up for all deployments for which they get alerted on if a node or deployment is unavailable. At this time, we checked all alerts triggered via Pulse and have confirmed that the deployments for which these alerts were triggered are all in a healthy state.
- investigating Apr 25, 2023, 12:04 PM UTC
At this point, investigations point out to networking issues that could have caused the data to be missed causing the false alerts. We are continuing to investigate the issue.
- identified Apr 26, 2023, 10:09 PM UTC
There were some false heartbeat alerts sent again today April 26, 2023 at 5:35 a.m UTC. Our team has identified one possible contention point that could be causing the issue and is working to remedy it.
- monitoring May 03, 2023, 03:30 PM UTC
A fix was implemented by our team and we are not observing any false heartbeat alerts since April 28, 5:35am UTC. We are going to continue monitoring the performance.
- resolved May 08, 2023, 06:16 PM UTC
Our team continued to monitor Pulse Monitoring through the weekend, and we see that the systems are now performing as expected and there have been no false alerts since the last incident on April 28, 5:35am UTC. This incident has been resolved.