Onfido experienced a minor incident on March 31, 2025 affecting Document Verification, lasting 21m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Mar 31, 2025, 09:41 PM UTC
We are currently experiencing an issue that is negatively impacting latency on check completion.
- monitoring Mar 31, 2025, 09:43 PM UTC
The issue has been resolved and we are monitoring the results.
- resolved Mar 31, 2025, 10:03 PM UTC
This incident has been resolved. A small backlog of manual tasks will be cleared within the next 1-2 hours.
- postmortem Apr 09, 2025, 06:49 AM UTC
### Summary One of our components contributing to automatic processing for Document Reports had a spike of timeout errors from 9.05pm until 9.20pm in the EU cluster. All Document Reports created between 9:20pm and 9:40pm UTC were processed with a higher TaT by manual analysts. ### Root Causes Two faulty nodes in our production cluster temporarily slowed down the execution of a CPU intensive component. ### Timeline _9:21pm UTC: Elevated error rates for the relevant component trigger an on-call alert._ _9:28pm UTC: We identified two nodes of our cluster as culprits for slow CPU intensive executions._ _9:33pm UTC: Restart the two nodes._ _9:40pm UTC: The affected component recovers successfully._ _9:41pm UTC: Backlog of reports observed. Public incident raised to inform customers of expected time to clear._