Shelf.io incident
US region: Ongoing upstream provider degradation causing intermittent errors across multiple services
Shelf.io experienced a major incident on October 20, 2025 affecting Integrations and Authentication and 1 more component, lasting 15h 20m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 20, 2025, 02:16 PM UTC
We are currently investigating this issue.
- investigating Oct 20, 2025, 02:21 PM UTC
We are continuing to investigate this issue.
- monitoring Oct 20, 2025, 02:26 PM UTC
A fix has been implemented and we are monitoring the results.
- identified Oct 20, 2025, 03:01 PM UTC
The issue has been identified and a fix is being implemented.
- monitoring Oct 20, 2025, 03:17 PM UTC
A fix has been implemented and we are monitoring the results.
- monitoring Oct 20, 2025, 03:35 PM UTC
We're still observing occasional timeouts in Agent Assist searches for certain real-time suggestions.
- identified Oct 20, 2025, 04:13 PM UTC
Reopened. Our earlier resolution was premature. Our cloud infrastructure provider continues to report widespread degradation in their US‑East region (about 80 services affected, several in a brownout state). This is causing intermittent errors, elevated latency, and timeouts across multiple Shelf services in the US region, Answer Assist and Search, and integrations. EU and CA regions remain healthy. We are working with the provider and have applied mitigations to reduce impact. Next update within 30 minutes.
- identified Oct 20, 2025, 05:46 PM UTC
Update - This remains a brownout, not a full outage. Most requests in the US region are succeeding, but ongoing upstream degradation is causing intermittent errors and latency. Authentication, Viewing Content & Search, Self‑Service Portals, Emails & Notifications, and Making Changes to Content are largely available; Integrations and Insights & Analytics are the most affected with higher delays/timeouts. We continue to mitigate and monitor.
- monitoring Oct 20, 2025, 08:08 PM UTC
Update - Error rates are trending down. Over the last few minutes the US region is averaging about 0.4% 5xx responses. Core paths (Authentication, Viewing Content & Search, Self‑Service, and content edits) are each ≤0.5%, while Integrations and Insights & Analytics fluctuate slightly higher but remain under 1%. We'll keep mitigations in place and continue to monitor.
- monitoring Oct 20, 2025, 10:01 PM UTC
In the last three hours we observed a single brief spike in 5xx responses around 20:45 UTC in the US region; since then, 5xx has held at 0% across monitored endpoints. For transparency, we'll keep the incident open in Monitoring with no components marked as affected through the end of the day while we continue to observe.
- resolved Oct 21, 2025, 06:18 AM UTC
This incident has been resolved.