Ambra experienced a minor incident on June 1, 2023 affecting Web Services and Image Processing and 1 more component, lasting 1h 21m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jun 01, 2023, 02:04 PM UTC
We have received reports of issues on the Ambra platform. Engineering teams are currently investigating. Additional information will be provided as soon as it is available.
- identified Jun 01, 2023, 02:20 PM UTC
At this time, our engineering team has identified that there is an excess of purging activity which is causing congestion in storage, which in turn is causing issues with image processing and viewing. We will continue to post further details as they are available.
- monitoring Jun 01, 2023, 02:29 PM UTC
The engineering team has implemented a configuration change and have restarted storage nodes. The team will monitor system performance for any persistent issues. We will continue to post further updates as available.
- monitoring Jun 01, 2023, 03:01 PM UTC
At this time, the backlog is gradually reducing. The engineering team continues to monitor the situation and we will provide additional information as we receive it.
- resolved Jun 01, 2023, 03:25 PM UTC
The incident has been fully resolved and service is back to normal levels. Our team will be conducting a root cause analysis and sharing it as soon as possible. We will continue to monitor the situation to ensure there are no further issues.
- postmortem Jun 29, 2023, 10:11 PM UTC
Ambra automatically runs certain background tasks such as purge rules during off-peak hours in order to minimize the impact on the rest of the system. On the morning of June 1 one of these background tasks processed 15-20 times as much data as usual and continued running into peak hours. There were also some inefficiencies in the way sub-tasks created by that job were processed, which were not noticeable during normal levels of processing but which caused high error rates under the increased workload. During the incident, Ambra's engineering teams modified some configuration settings on our storage servers in order to mitigate the issue; afterwards they identified a defect to be fixed in future releases and are investigating further improvements to how these jobs are processed.