Ambra experienced a minor incident on May 16, 2023 affecting Web Services and Image Processing and 1 more component, lasting 56m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating May 16, 2023, 04:20 PM UTC
We have received reports of issues on the Ambra platform. Engineering teams are currently investigating. Additional information will be provided as soon as it is available.
- investigating May 16, 2023, 04:22 PM UTC
We are continuing to investigate this issue.
- investigating May 16, 2023, 04:27 PM UTC
The issue appears to be centered around certain storage nodes - remediation is underway.
- identified May 16, 2023, 04:37 PM UTC
We're starting to see improvements and are continuing to monitor.
- identified May 16, 2023, 05:04 PM UTC
We're still seeing some issues around storage; further remediation steps were taken and the situation is continuing to improve.
- resolved May 16, 2023, 05:19 PM UTC
We believe the issue is resolved and will continue to provide updates as engineering makes progress on the root cause investigation.
- postmortem May 25, 2023, 10:08 PM UTC
A shared caching component in our storage subsystem experienced some brief timeouts. The frontend storage nodes did not handle this situation well leading to further errors even once the caching layer had recovered. As a workaround we manually disabled the use of the shared caching layer, instead using a local cache on each individual storage node. Engineering later implemented improvements to error handling.