Ambra incident

InteleShare Incident

Major Resolved View vendor source →

Ambra experienced a major incident on December 20, 2024 affecting Web Services and Image Processing and 1 more component, lasting 2h 40m. The incident has been resolved; the full update timeline is below.

Started
Dec 20, 2024, 06:02 PM UTC
Resolved
Dec 20, 2024, 08:43 PM UTC
Duration
2h 40m
Detected by Pingoru
Dec 20, 2024, 06:02 PM UTC

Affected components

Web ServicesImage ProcessingImage Viewing

Update timeline

  1. investigating Dec 20, 2024, 06:02 PM UTC

    We have received reports of issues on the InteleShare platform. Engineering teams are currently investigating. Additional information will be provided as soon as it is available.

  2. investigating Dec 20, 2024, 06:26 PM UTC

    Currently we are experiencing platform issues related to viewing images. Our Engineering teams are actively investigating and working to identify the root cause. We understand the urgency and we appreciate your patience as we work to address the issue.

  3. investigating Dec 20, 2024, 06:57 PM UTC

    Our engineering teams have identified issues related to storage nodes, but are still are working to determine the root cause and a fix. We are continuing to investigate and will provide further updates as soon as possible.

  4. identified Dec 20, 2024, 07:33 PM UTC

    We are currently experiencing study viewing issues due to an overload of incoming requests which is impacting our storage nodes. Our Engineering team has identified the root cause and is actively working on a solution. We are committed to resolving the issue as quickly as possible and will provide updates as they become available. We appreciate your patience and understanding during this time.

  5. monitoring Dec 20, 2024, 08:02 PM UTC

    Remediation to mitigate the issue has been applied. The ability to view images has improved at this time however we will continue to monitor the situation to ensure there are no further issues and send additional updates on any new developments.

  6. resolved Dec 20, 2024, 08:43 PM UTC

    The incident has been fully resolved and service is back to normal levels. Our team will be conducting a root cause analysis and sharing as soon as possible. We will continue to monitor the situation to ensure there are no further issues.

  7. postmortem Jan 14, 2025, 12:50 AM UTC

    An internally-initiated data migration job began using too many resources and interfering with normal system operation. During this period interactive performance \(User Interface\) was slower than usual and background activities were being queued faster than they could be processed, creating a large backlog. Once the cause of the issue had been determined, we cancelled the data migration job. Interactive performance returned to normal, though background activities such as ingestion of new studies continued to be delayed until the queue backlog had been fully processed.