Nanonets incident

High Response Times for Instant learning models in US region

Nanonets experienced a major incident on December 3, 2025 affecting API, lasting 6h 1m. The incident has been resolved; the full update timeline is below.

Started: Dec 03, 2025, 03:08 PM UTC
Resolved: Dec 03, 2025, 09:10 PM UTC
Duration: 6h 1m
Detected by Pingoru: Dec 03, 2025, 03:08 PM UTC

Affected components

API

Update timeline

investigating Dec 03, 2025, 03:08 PM UTC

We are currently investigating this issue.
identified Dec 03, 2025, 04:35 PM UTC

The issue has been identified and a fix is being implemented.
identified Dec 03, 2025, 05:08 PM UTC

We have identified the issue and increased our throughput to process requests faster. However, clearing the backlog may take some additional time. Our team is continuously monitoring the situation. We apologize for the inconvenience and appreciate your patience.
monitoring Dec 03, 2025, 08:50 PM UTC

Our sync prediction API for instant learning models is now operating normally. Async results for older files are available for most users and for few users we see some backlog which is getting cleared
resolved Dec 03, 2025, 09:10 PM UTC

This incident has been resolved.
postmortem Dec 04, 2025, 05:57 AM UTC

**Overview** One of our secondary databases experienced connection issues following an unexpected spike in load. This sudden surge placed significant pressure on the database engine, causing request latency to increase substantially. **Root Cause** A rapid increase in incoming traffic caused a large number of concurrent connections to accumulate on a secondary database. This overwhelmed the database’s connection handling capacity, resulting in slow responses and delayed processing for dependent services. **Resolution** Our engineering team quickly identified the issue, took corrective actions to stabilize the database, and restored normal operation. Once the database recovered, the system began processing the accumulated backlog. This recovery phase took additional time due to the volume of pending requests. **Impact** * **Affected:** Only Instant Learning models in the **US region** * **Unaffected:** All other regions and services apart from above continued operating normally throughout the incident