Ambra incident

Ambra Incident

Minor Resolved View vendor source →

Ambra experienced a minor incident on November 29, 2023 affecting Web Services and Image Processing and 1 more component, lasting 2h 47m. The incident has been resolved; the full update timeline is below.

Started
Nov 29, 2023, 06:56 PM UTC
Resolved
Nov 29, 2023, 09:44 PM UTC
Duration
2h 47m
Detected by Pingoru
Nov 29, 2023, 06:56 PM UTC

Affected components

Web ServicesImage ProcessingImage Viewing

Update timeline

  1. investigating Nov 29, 2023, 06:56 PM UTC

    We have received reports of issues on the Ambra platform. Engineering teams are currently investigating. Additional information will be provided as soon as it is available.

  2. investigating Nov 29, 2023, 07:21 PM UTC

    Our engineering teams have not yet identified the root cause. We are continuing to investigate and will provide further updates as soon as possible.

  3. monitoring Nov 29, 2023, 07:55 PM UTC

    Our engineering teams have taken steps to address ongoing issues by restarting storage nodes and implementing some configuration changes. At this time we are seeing improvements in system performance, and will continue to monitor for further issues.

  4. resolved Nov 29, 2023, 09:44 PM UTC

    The incident has been resolved. Our team will be conducting a root cause analysis and sharing as soon as possible. We will continue to monitor the situation to ensure there are no further issues.

  5. postmortem Dec 01, 2023, 10:02 PM UTC

    Ambra storage began experiencing a higher than usual number of errors and timeouts, causing slow image viewing and/or gateway backlogs. We observed high network activity on our caching cluster, causing network buffers to increase and leading to rapid memory growth. In order to mitigate the issue we modified some cache configuration settings in order to limit the buffer sizes and slow the memory growth. We also deployed additional storage nodes and implemented logic on our load balancers in order to more quickly reduce traffic to unhealthy nodes. To address the underlying problem of high network activity, an upcoming Ambra release will contain several optimizations to significantly reduce the overall network traffic required for cache lookups.