Ambra incident

InteleShare Incident

Ambra experienced a major incident on December 20, 2024 affecting Web Services and Image Processing and 1 more component, lasting 2h 40m. The incident has been resolved; the full update timeline is below.

Started: Dec 20, 2024, 06:02 PM UTC
Resolved: Dec 20, 2024, 08:43 PM UTC
Duration: 2h 40m
Detected by Pingoru: Dec 20, 2024, 06:02 PM UTC

Affected components

Web ServicesImage ProcessingImage Viewing

Update timeline

investigating Dec 20, 2024, 06:02 PM UTC

We have received reports of issues on the InteleShare platform. Engineering teams are currently investigating. Additional information will be provided as soon as it is available.
investigating Dec 20, 2024, 06:26 PM UTC

Currently we are experiencing platform issues related to viewing images. Our Engineering teams are actively investigating and working to identify the root cause. We understand the urgency and we appreciate your patience as we work to address the issue.
investigating Dec 20, 2024, 06:57 PM UTC

Our engineering teams have identified issues related to storage nodes, but are still are working to determine the root cause and a fix. We are continuing to investigate and will provide further updates as soon as possible.
identified Dec 20, 2024, 07:33 PM UTC

We are currently experiencing study viewing issues due to an overload of incoming requests which is impacting our storage nodes. Our Engineering team has identified the root cause and is actively working on a solution. We are committed to resolving the issue as quickly as possible and will provide updates as they become available. We appreciate your patience and understanding during this time.
monitoring Dec 20, 2024, 08:02 PM UTC

Remediation to mitigate the issue has been applied. The ability to view images has improved at this time however we will continue to monitor the situation to ensure there are no further issues and send additional updates on any new developments.
resolved Dec 20, 2024, 08:43 PM UTC

The incident has been fully resolved and service is back to normal levels. Our team will be conducting a root cause analysis and sharing as soon as possible. We will continue to monitor the situation to ensure there are no further issues.
postmortem Jan 14, 2025, 12:50 AM UTC

An internally-initiated data migration job began using too many resources and interfering with normal system operation. During this period interactive performance \(User Interface\) was slower than usual and background activities were being queued faster than they could be processed, creating a large backlog. Once the cause of the issue had been determined, we cancelled the data migration job. Interactive performance returned to normal, though background activities such as ingestion of new studies continued to be delayed until the queue backlog had been fully processed.