Ometria experienced a major incident on March 3, 2023 affecting APIs, lasting 13h 7m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Mar 03, 2023, 09:10 PM UTC
We are investigating a partial API outage affecting some services.
- investigating Mar 03, 2023, 09:11 PM UTC
We are continuing to investigate this issue.
- identified Mar 03, 2023, 09:52 PM UTC
The issue has been identified and currently investigating next steps to resolve.
- monitoring Mar 03, 2023, 10:13 PM UTC
A fix has been implemented and we are monitoring the results.
- resolved Mar 04, 2023, 10:17 AM UTC
This incident has been resolved.
- postmortem Mar 07, 2023, 05:26 PM UTC
On 3rd March 2023 at 17:15 UTC our monitoring systems brought to our attention an issue with our API endpoints \(excluding transactional\) resulting in data loss for requests which were not automatically retried by client side retry mechanisms between 17:15 - 18:12 and 19:14 to 22:02 UTC. We identified a recent deployment that introduced an unexpectedly high load on the API key validation mechanism. A hotfix was deployed to mitigate the issue, with immediate positive results seen by 22:02 UTC on 3rd March 2023. We have implemented process changes to prevent such occurrences in the future. As best practice, we highly recommend that all clients implement an automatic retry mechanism for API requests as it ensures logs are reviewed over time. Should errors exist on Friday 3rd March between 17:15 - 18:12 and 19:14 to 22:02 UTC which were not retried, these will need to be retried manually.