Onfido incident

Increase in webhooks latency

Minor Resolved View vendor source →

Onfido experienced a minor incident on October 9, 2025 affecting Webhooks, lasting 38m. The incident has been resolved; the full update timeline is below.

Started
Oct 09, 2025, 01:17 PM UTC
Resolved
Oct 09, 2025, 01:55 PM UTC
Duration
38m
Detected by Pingoru
Oct 09, 2025, 01:17 PM UTC

Affected components

Webhooks

Update timeline

  1. investigating Oct 09, 2025, 01:17 PM UTC

    We've identified an increase in webhooks latency affecting the EU region.

  2. monitoring Oct 09, 2025, 01:50 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. resolved Oct 09, 2025, 01:55 PM UTC

    This incident has been resolved.

  4. postmortem Oct 13, 2025, 02:04 PM UTC

    ### Summary Webhooks latency increased to up to 49 minutes in the EU region between 12:39 and 13:39 UTC on the October 9, 2025. Although no webhooks were lost during the incident, some clients encountered rate limit errors when calling the API while processing a surge of delayed webhooks. These webhooks were subsequently retried according to the logic outlined in our public documentation. ### Root Causes An infrastructure dependency that helps reduce duplicate delivery of webhooks experienced a hardware failure. As a result there was a significant decrease of the service throughput during the incident. ### Timeline 12:39 UTC: Throughput of the service responsible for delivering webhooks decreased 12:45 UTC: Our on-call team gets notified and starts the investigation 13:16 UTC: The on-call team acknowledge the widespread impact of the incident and updates the status page 13:42 UTC: The on-call team identifies the faulty infrastructure dependency 13:49 UTC: The service recovered and all pending webhooks were delivered ### Remedies We’ll improve the resilience of webhook delivery in case of failures to this piece of infrastructure. We will also update our runbooks with specific instructions to help diagnose this type of failure and therefore decrease the recovery time in the case of similar incidents.