Ambra incident

InteleShare Incident

Major Resolved View vendor source →

Ambra experienced a major incident on April 9, 2025 affecting Web Services and Image Processing and 1 more component, lasting 11h 20m. The incident has been resolved; the full update timeline is below.

Started
Apr 09, 2025, 03:53 PM UTC
Resolved
Apr 10, 2025, 03:13 AM UTC
Duration
11h 20m
Detected by Pingoru
Apr 09, 2025, 03:53 PM UTC

Affected components

Web ServicesImage ProcessingImage Viewing

Update timeline

  1. investigating Apr 09, 2025, 03:53 PM UTC

    We have received reports of issues on the InteleShare platform. Engineering teams are currently investigating. Additional information will be provided as soon as it is available.

  2. investigating Apr 09, 2025, 04:18 PM UTC

    Our Engineering teams are actively investigating and working to identify the root cause. We understand the urgency and we appreciate your patience as we work to address the issue.

  3. monitoring Apr 09, 2025, 04:45 PM UTC

    The engineering team has identified the area of concern related to platform latency and log in errors. A remediation to mitigate the issue has been applied. We are working towards full recovery of any affected services. We will continue to monitor the situation to ensure there are no further issues and send additional updates on any new developments.

  4. monitoring Apr 09, 2025, 05:24 PM UTC

    Our teams continue to work towards full system recovery and monitoring overall platform performance. Updates will continue to be posted here as they become available.

  5. monitoring Apr 09, 2025, 06:45 PM UTC

    At this time, our queues are still processing a backlog, resulting in some residual delays. Our teams continue to work towards full system recovery and are still monitoring for any further issues.

  6. identified Apr 09, 2025, 07:44 PM UTC

    We have identified and are investigating new reports of users being unable to log into the InteleShare UI, along with reports of queue processing issues/errors. The engineering team is working on these issues further and we will continue to provide updates on performance issues.

  7. investigating Apr 09, 2025, 08:10 PM UTC

    Our Engineering teams continue to actively investigate and work to identify the root cause of these issues. As soon as further information is made available we will advise of next steps on our status page. Thank you for your patience.

  8. investigating Apr 09, 2025, 08:47 PM UTC

    Our engineering team is actively investigating the cause of these issues. A high system load is resulting in delays, causing backups in our queues. We apologize for the inconvenience and appreciate your patience as we work to resolve this as quickly as possible.

  9. investigating Apr 09, 2025, 09:31 PM UTC

    Our engineering team is actively investigating the cause of these issues. A high system load is resulting in delays, causing backups in our queues. We apologize for the inconvenience and appreciate your patience as we work to resolve this as quickly as possible.

  10. investigating Apr 09, 2025, 10:26 PM UTC

    High system load is resulting in delays, login issues, and causing backups in our queues. We apologize for the inconvenience and appreciate your patience as we work to resolve this as quickly as possible.

  11. investigating Apr 09, 2025, 11:06 PM UTC

    Our engineering team is actively investigating the cause of these issues. A high system load is resulting in delays, causing backups in our queues. We apologize for the inconvenience and appreciate your patience as we work to resolve this as quickly as possible.

  12. investigating Apr 10, 2025, 12:24 AM UTC

    High system load is resulting in delays, login issues, and causing backups in our queues. We apologize for the inconvenience and appreciate your patience as we work to resolve this as quickly as possible.

  13. investigating Apr 10, 2025, 01:03 AM UTC

    Our Engineering teams continue to actively investigate. High system load is resulting in delays, login issues, and causing backups in our queues. We apologize for the inconvenience and appreciate your patience as we work to resolve this as quickly as possible.

  14. investigating Apr 10, 2025, 02:02 AM UTC

    High system load is still resulting in delays and causing backups in our queues. We apologize for the inconvenience and appreciate your patience as we work to resolve this as quickly as possible.

  15. resolved Apr 10, 2025, 03:13 AM UTC

    We have made significant progress on draining the queues in the last hour and have fully recovered. We are back to normal processing levels.

  16. postmortem Apr 14, 2025, 08:04 PM UTC

    Our investigation has revealed that a scheduled task, which was designed to incrementally process items since its last run, was erroneously attempting to process the entire dataset each time it was executed. This misconfiguration led to a cumulative impact on system performance, resulting in errors and timeouts during both interactive sessions and API calls. Upon discovery, we immediately disabled the task to prevent further performance degradation. We are committed to ensuring the reliability of our platform and are exploring the need for this functionality. We will reimplement the task using current APIs that adhere to a more standardized workflow. In addition to addressing the immediate issue, our operations and research and development \(R&D\) teams have identified opportunities for enhancing the overall performance of the InteleShare platform. These improvements are scheduled for implementation in upcoming releases and are part of our ongoing commitment to providing you with a robust and efficient platform. We apologize for any inconvenience this may have caused and appreciate your understanding as we continuously work to improve our services.