Shelf.io incident
Search, Announcements, and Feedback API errors (US)
Shelf.io experienced a minor incident on October 17, 2025 affecting Viewing Content & Search, lasting 41m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 17, 2025, 03:34 PM UTC
We are currently investigating increased error rates affecting search and content discovery in our US region. Some users may experience slower response times or intermittent errors when searching for content, viewing announcements, or submitting feedback. Other platform capabilities, including content creation and editing, remain fully operational.
- monitoring Oct 17, 2025, 04:03 PM UTC
We have identified and addressed the underlying issue affecting search and content discovery performance in the US region. Our engineering team has implemented corrective measures and is actively monitoring system behavior. During peak impact, approximately 20-40% of search and content discovery requests experienced errors. We continue to observe system performance to ensure full stability.
- resolved Oct 19, 2025, 06:29 PM UTC
This incident has been resolved. All search, announcement, and feedback services in the US region have returned to normal operation. System monitoring confirms stable performance with no residual errors. We apologize for any disruption this may have caused.
- postmortem Oct 19, 2025, 07:47 PM UTC
On October 17, 2025, between 15:22 and 16:03 UTC, users in our US region experienced intermittent errors when searching for content, viewing announcements, and submitting feedback. At the peak of the incident, approximately 40% of requests to these services returned errors, while other platform capabilities remained fully operational. This event occurred during an ongoing maintenance process to upgrade our search relevance engine - work aimed at delivering more accurate, higher‑quality results to users. We ran this upgrade path in the background for several days at minimal load to avoid disruption. Near the end of the week, a configuration error caused the process to place more load on the system than intended. Coupled with a spike in search activity, this created temporary capacity pressure and intermittent errors in the US region. The outage was partial: error rates varied over the window and peaked around 40%, with the majority of requests continuing to succeed. We've already applied lessons from a prior search incident and rolled out the upgraded search infrastructure in the EU and CA regions; the US region is scheduled next. We'll continue with this upgrade, and will better communicate maintenance windows in advance, align change timing to lower‑traffic periods, and add redundancy specific to search operations to further protect performance during these necessary maintenance activities.