Ambra incident

Ambra Incident

Minor Resolved View vendor source →

Ambra experienced a minor incident on May 16, 2023 affecting Web Services and Image Processing and 1 more component, lasting 56m. The incident has been resolved; the full update timeline is below.

Started
May 16, 2023, 04:22 PM UTC
Resolved
May 16, 2023, 05:19 PM UTC
Duration
56m
Detected by Pingoru
May 16, 2023, 04:22 PM UTC

Affected components

Web ServicesImage ProcessingImage Viewing

Update timeline

  1. investigating May 16, 2023, 04:20 PM UTC

    We have received reports of issues on the Ambra platform. Engineering teams are currently investigating. Additional information will be provided as soon as it is available.

  2. investigating May 16, 2023, 04:22 PM UTC

    We are continuing to investigate this issue.

  3. investigating May 16, 2023, 04:27 PM UTC

    The issue appears to be centered around certain storage nodes - remediation is underway.

  4. identified May 16, 2023, 04:37 PM UTC

    We're starting to see improvements and are continuing to monitor.

  5. identified May 16, 2023, 05:04 PM UTC

    We're still seeing some issues around storage; further remediation steps were taken and the situation is continuing to improve.

  6. resolved May 16, 2023, 05:19 PM UTC

    We believe the issue is resolved and will continue to provide updates as engineering makes progress on the root cause investigation.

  7. postmortem May 25, 2023, 10:08 PM UTC

    A shared caching component in our storage subsystem experienced some brief timeouts. The frontend storage nodes did not handle this situation well leading to further errors even once the caching layer had recovered. As a workaround we manually disabled the use of the shared caching layer, instead using a local cache on each individual storage node. Engineering later implemented improvements to error handling.