Box incident

[Major] Some Users Unable to View All Files and API Pages

Major Resolved View vendor source →

Box experienced a major incident on March 3, 2025 affecting Content API and Login/SSO, lasting 35m. The incident has been resolved; the full update timeline is below.

Started
Mar 03, 2025, 10:27 AM UTC
Resolved
Mar 03, 2025, 11:03 AM UTC
Duration
35m
Detected by Pingoru
Mar 03, 2025, 10:27 AM UTC

Affected components

Content APILogin/SSO

Update timeline

  1. investigating Mar 03, 2025, 10:27 AM UTC

    We are currently investigating an issue where some users may be unable to view their All Files and API pages. We will provide additional information as it becomes available.

  2. monitoring Mar 03, 2025, 10:41 AM UTC

    We have taken action to remediate this incident and are no longer seeing the issue occurring. We are continuing monitoring to ensure there is no additional impact.

  3. resolved Mar 03, 2025, 11:03 AM UTC

    No additional impact has been observed and this issue is considered fully resolved. If you are still experiencing any issues, please contact us via https://support.box.com.

  4. postmortem Apr 24, 2025, 12:48 PM UTC

    We recently addressed issues affecting Box services. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future. Between 1:48 AM PT and 2:25 AM PT on March 3, 2025, some users may have experienced difficulties while working in Box. Additionally, starting at 8:44 PM PT that same day, some users may have once again encountered issues. The disruption ended before 9:59 PM PST. During these time periods, a subset of users experienced slowness and intermittent errors with Notes, Public API, logins and uploads/downloads. The issue occurred as a result of a fragmented system table on a database cluster which ultimately led to the database crashing. The first instance was caused by increased traffic while the second occurred due to our manual remediation process putting additional load on the database. Our database remediation service attempted to resolve the issue both times but was unsuccessful due to the thread\_cache\_size setting being set too low. We were able to address the short-term problem by manually redirecting traffic to a healthy database node. To maintain medium-term stability of the database, the team rebuilt the cluster to eliminate the fragmented table. Additionally, we will be splitting the database cluster into smaller databases to prevent future overloads and improving our database remediation service to better handle this type of case. ‌ **Analysis** The database cluster at issue experienced gradual performance degradation before the issue became apparent. This degradation was caused by the growing fragmented system table due to increasing database size and traffic. However, this degradation went unnoticed because the existing alerting system did not flag any problems. In addition, the auto-remediation system was unsuccessful because it hit a case where two database configurations were incompatible. Specifically, the max\_connections setting was increased without adjusting the thread\_cache\_size, resulting in frequent thread cache misses and preventing the failover procedure from having the resources needed to succeed. ‌ **Corrective Actions** Box has initiated the following corrective actions: * Rebuilding the database cluster to eliminate the table fragmentation and prevent medium-term performance degradation * Adding metrics and alerting for table fragmentation to proactively monitor issues * Adjusting database configurations such that thread\_cache\_size dynamically adjusts with the max\_connections database configuration settings * Improving the database remediation process by adjusting timeouts to accommodate large database clusters * Accelerating the database split process to quickly divide large clusters, reducing traffic overload and improving routine maintenance success ‌ We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. Sincerely, The Box Team