Ometria incident

API service degradation

Major Resolved View vendor source →

Ometria experienced a major incident on March 3, 2023 affecting APIs, lasting 13h 7m. The incident has been resolved; the full update timeline is below.

Started
Mar 03, 2023, 09:10 PM UTC
Resolved
Mar 04, 2023, 10:17 AM UTC
Duration
13h 7m
Detected by Pingoru
Mar 03, 2023, 09:10 PM UTC

Affected components

APIs

Update timeline

  1. investigating Mar 03, 2023, 09:10 PM UTC

    We are investigating a partial API outage affecting some services.

  2. investigating Mar 03, 2023, 09:11 PM UTC

    We are continuing to investigate this issue.

  3. identified Mar 03, 2023, 09:52 PM UTC

    The issue has been identified and currently investigating next steps to resolve.

  4. monitoring Mar 03, 2023, 10:13 PM UTC

    A fix has been implemented and we are monitoring the results.

  5. resolved Mar 04, 2023, 10:17 AM UTC

    This incident has been resolved.

  6. postmortem Mar 07, 2023, 05:26 PM UTC

    On 3rd March 2023 at 17:15 UTC our monitoring systems brought to our attention an issue with our API endpoints \(excluding transactional\) resulting in data loss for requests which were not automatically retried by client side retry mechanisms between 17:15 - 18:12 and 19:14 to 22:02 UTC. We identified a recent deployment that introduced an unexpectedly high load on the API key validation mechanism. A hotfix was deployed to mitigate the issue, with immediate positive results seen by 22:02 UTC on 3rd March 2023. We have implemented process changes to prevent such occurrences in the future. As best practice, we highly recommend that all clients implement an automatic retry mechanism for API requests as it ensures logs are reviewed over time. Should errors exist on Friday 3rd March between 17:15 - 18:12 and 19:14 to 22:02 UTC which were not retried, these will need to be retried manually.