Ambra incident

InteleShare Incident

Ambra experienced a major incident on April 26, 2025 affecting Web Services and Image Processing and 1 more component, lasting 1d 5h. The incident has been resolved; the full update timeline is below.

Started: Apr 26, 2025, 09:21 AM UTC
Resolved: Apr 27, 2025, 02:56 PM UTC
Duration: 1d 5h
Detected by Pingoru: Apr 26, 2025, 09:21 AM UTC

Affected components

Web ServicesImage ProcessingImage Viewing

Update timeline

investigating Apr 25, 2025, 07:08 PM UTC

We have received reports of issues on the InteleShare platform. Engineering teams are currently investigating. Additional information will be provided as soon as it is available.
investigating Apr 25, 2025, 07:44 PM UTC

Our engineering teams have not yet identified the root cause. We are continuing to investigate and will provide further updates as soon as possible.
investigating Apr 25, 2025, 08:16 PM UTC

Our engineering teams have not yet identified the root cause. We are continuing to investigate and will provide further updates as soon as possible.
investigating Apr 25, 2025, 08:19 PM UTC

We are continuing to investigate this issue.
investigating Apr 25, 2025, 08:54 PM UTC

Our engineering teams are focused on identifying the root cause of the incident and is dedicating all available resources to the investigation. We are working around the clock to resolve the issue and will provide updates as soon as we have more information.
investigating Apr 25, 2025, 09:33 PM UTC

Our Engineering teams continue to actively investigate and work to identify the root cause of these issues. As soon as further information is made available we will advise of next steps on our status page. Thank you for your patience.
investigating Apr 25, 2025, 10:31 PM UTC

We are currently experiencing delays in our backend services due to a backlog in our queues. Our engineering team is actively investigating the root cause of these issues and working diligently to resolve them. We apologize for any inconvenience this may cause and appreciate your patience as we work to restore normal service as quickly as possible.
investigating Apr 25, 2025, 11:50 PM UTC

Backend services load is resulting in delays, login issues, and causing backups in our backend services queues. We apologize for the inconvenience and appreciate your patience as we work to resolve this as quickly as possible.
investigating Apr 26, 2025, 12:58 AM UTC

Due to high load, our backend services are still experiencing delays, login issues, and backups in the queues. We apologize for the inconvenience and appreciate your patience as we work to resolve this as quickly as possible.
investigating Apr 26, 2025, 02:02 AM UTC

We are continuing to investigate this issue. Backend services load is resulting in delays, login issues, and causing backups in our backend services queues. We apologize for the inconvenience and appreciate your patience as we work to resolve this as quickly as possible.
investigating Apr 26, 2025, 03:03 AM UTC

Our Engineering teams continue to actively investigate at this time. Backend services load is resulting in delays, login issues, and causing backups in our backend services queues. We apologize for the inconvenience and appreciate your patience as we work to resolve this as quickly as possible.
investigating Apr 26, 2025, 04:55 AM UTC

Our Engineering teams have made progress in addressing the backend service issues. We're beginning to see a reduction in queue backlogs, though the system is still experiencing some delays and login issues as we work through the accumulated backups. We anticipate these issues will continue until all queues are fully cleared. We apologize for the inconvenience and appreciate your patience as we work toward full resolution.
investigating Apr 26, 2025, 05:47 AM UTC

Our Engineering teams have made progress in addressing the backend service issues. We're continuing to see a reduction in queue backlogs, though the system is still experiencing some delays and login issues as we work through the accumulated backups. We anticipate these issues will continue until all queues are fully cleared. We apologize for the inconvenience and appreciate your patience as we work toward full resolution.
monitoring Apr 26, 2025, 07:06 AM UTC

The issue has been identified. Our engineering team has successfully mitigated some runaway activity and is currently working through the backlog to restore optimal performance. Delays will persist until the queues are fully cleared. We appreciate your continued patience.
monitoring Apr 26, 2025, 09:21 AM UTC

The issue has been identified. Our engineering team has successfully mitigated some runaway activity and is currently working through the backlog to restore optimal performance. Delays will persist until the queues are fully cleared. We appreciate your continued patience.
monitoring Apr 26, 2025, 12:25 PM UTC

The issue has been identified. Our engineering team has successfully mitigated some runaway activity and is currently working through the backlog to restore optimal performance. Delays will persist until the queues are fully cleared. We appreciate your continued patience.
monitoring Apr 26, 2025, 03:34 PM UTC

Our engineering team has identified a performance bottleneck and are actively making adjustments in an attempt to mitigate it. We will continue providing updates periodically and apologize for any inconvenience, your continued patience is appreciated as we work toward full resolution.
identified Apr 26, 2025, 07:46 PM UTC

At this time our engineering team continues their work towards resolving system issues and we expect full restore of the system in approximately 1.5 hours. Thank you for your continued patience, we will post an update with further information as received.
monitoring Apr 26, 2025, 09:43 PM UTC

The root cause has been identified, a remediation to mitigate the issue has been applied. Our team is working towards full recovery of any affected services. We will provide an estimate on full recovery as soon as possible We will continue to monitor the situation to ensure there are no further issues and send additional updates on any new developments.
monitoring Apr 26, 2025, 10:20 PM UTC

At this time, we have processed approximately 75% of the queue backlog and continue to trend in the right direction Most UI functions, including logins and viewing studies already in InteleShare are functioning normally. Processes which involve ingesting or moving data (study uploads, shares, etc) may still be delayed we continue to process the queue.
monitoring Apr 26, 2025, 11:35 PM UTC

Our team continues working towards full recovery of any affected services. We will continue to monitor the situation to ensure there are no further issues and send additional updates on any new developments.
monitoring Apr 27, 2025, 12:50 AM UTC

Our engineering team is currently working on ways to expedite the remaining backlog. While there is no ETA on processing the current backlog, we will advise on next steps and progress towards full system recovery as soon as possible.
monitoring Apr 27, 2025, 02:42 AM UTC

At this time, our backlog continues to reduce. It may take some time to work through the remaining data. This process will continue to be monitored throughout the evening until we are able to confirm full system recovery.
monitoring Apr 27, 2025, 05:39 AM UTC

Please be aware that the InteleShare platform will be taken down/unavailable for approximately 1 hour, starting at 1:45am ET, as our Engineering teams work on implementing a fix for our current platform issues. Once completed we will provide a new update.
monitoring Apr 27, 2025, 06:46 AM UTC

Emergency system maintenance has been completed, and services have been restored. We continue to monitor the processing of studies in our backlog at this time.
monitoring Apr 27, 2025, 08:08 AM UTC

At this time we are seeing our queue backlog reduce at a faster rate, and we continue to monitor progress.
monitoring Apr 27, 2025, 10:30 AM UTC

Our queue levels have lowered and returned to normal levels. We will continue to monitor system performance and ensure no further issues arise
monitoring Apr 27, 2025, 01:56 PM UTC

Our queue levels have lowered and returned to normal levels. We will continue to monitor system performance and ensure no further issues arise.
resolved Apr 27, 2025, 02:56 PM UTC

The incident has been fully resolved and service is back to normal levels. Our team will be conducting a root cause analysis and sharing as soon as possible. We will continue to monitor the situation to ensure there are no further issues.
postmortem Apr 28, 2025, 08:44 PM UTC

Between Friday, April 25 and Sunday, April 27, the InteleShare platform experienced degraded performance that affected both the user interface responsiveness and background processing operations. The issue has been fully resolved, and we have implemented both immediate fixes and planned long-term improvements to prevent similar incidents in the future. The primary cause of this incident was a network bandwidth limitation on cloud-hosted infrastructure related to queued job processing, which became saturated when processing an unusually high volume of queued operations. This limitation caused latency to increase substantially, leading to: 1. Degraded user interface responsiveness 2. Significant delays in background processing operations Once we upgraded the infrastructure, processing capacity increased, allowing us to clear the backlog and restore normal operations.