Ambra experienced a minor incident on May 23, 2023 affecting Web Services and Image Processing and 1 more component, lasting 1h 51m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating May 23, 2023, 02:49 PM UTC
We have received reports of issues on the Ambra platform. Engineering teams are currently investigating. Additional information will be provided as soon as it is available.
- identified May 23, 2023, 03:16 PM UTC
At this time, our engineering team has identified a backlog in our storage in processing study notifications. The team is working towards a resolution and we will provide further updates as they are received.
- identified May 23, 2023, 03:47 PM UTC
At this time, our storage backlog has decreased. The engineering team is working on deploying a storage change that will further improve performance. This change will be rolled out in stages. Further updates will continue to be posted as available.
- monitoring May 23, 2023, 04:14 PM UTC
The engineering team has begun rolling out the storage change in batches and will be monitoring post-deployment behavior as the change is implemented.
- resolved May 23, 2023, 04:41 PM UTC
The storage changes have been implemented across all storage nodes at this time, and study notifications are now in real time.
- postmortem May 25, 2023, 10:11 PM UTC
A defect in our job scheduling library caused heavy database load and led to delays propagating status updates between our backend storage and other Ambra components. The engineering team updated the storage code to improve the caching performance plus upgraded to a newer version of the third-party job scheduling library which resolved a related defect. Once the issue was resolved it took additional time for the backlog of notifications to be processed.