One Identity Starling experienced a major incident on April 18, 2024 affecting Safeguard On Demand and Safeguard On Demand and 1 more component, lasting 7h 55m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Apr 18, 2024, 12:59 PM UTC
We are currently investigating this issue.
- investigating Apr 18, 2024, 02:45 PM UTC
We are continuing to investigate this issue.
- investigating Apr 18, 2024, 05:51 PM UTC
We are continuing to investigate this issue. Next update will occur in 2 hours.
- identified Apr 18, 2024, 07:01 PM UTC
The issue has been identified and we are working on a fix
- resolved Apr 18, 2024, 08:54 PM UTC
This incident has now been resolved. We will follow this incident with an RCA.
- postmortem Apr 22, 2024, 08:47 PM UTC
What Occurred Some Safeguard On Demand Starling Edition \(SGODSE\) Customers reported they were unable to access the WebUI of their SPP/SPS appliances. What went wrong and why? A product update resulted in the amendment of traffic flow rules which blocked traffic between the app gateway in front of each existing customer instance and the servers hosting the SPP and SPS appliances. Impact was limited to the availability of the WebUI. How are we making incidents like this less likely or less impactful? We are improving our QA processes to include tests for this upgrade scenario in pre-production environments. Timeline 2024-04-18 03:59AM PDT * Update to the internal service responsible for managing instances of SGODSE deployed. 2024-04-18 04:23AM PDT * Automated testing and monitoring began reporting 502 Bad Gateway errors. * Customers reported 502 Bad Gateway errors shortly thereafter. * We began investigation 2024-04-18 09:35AM PDT * The issue was identified and we began fixing customer instances. 2024-04-18 12:45PM PDT * All customers were fully operational.