Box incident

[Minor] Issues with Multiple Box Services

Box experienced a minor incident on April 14, 2025 affecting Content API and Login/SSO and 1 more component, lasting 2h 6m. The incident has been resolved; the full update timeline is below.

Started: Apr 14, 2025, 12:24 AM UTC
Resolved: Apr 14, 2025, 02:30 AM UTC
Duration: 2h 6m
Detected by Pingoru: Apr 14, 2025, 12:24 AM UTC

Affected components

Content APILogin/SSOWeb ApplicationUploads/DownloadsUploads/Downloads

Update timeline

investigating Apr 14, 2025, 12:24 AM UTC

We are investigating an ongoing issue affecting the Box API, uploads, downloads, logins, and Box Notes. We will provide more information as soon as it is available.
identified Apr 14, 2025, 12:43 AM UTC

The issue has been identified and a fix is being implemented.
identified Apr 14, 2025, 01:38 AM UTC

We are continuing to work on a fix for this issue.
monitoring Apr 14, 2025, 02:11 AM UTC

A fix has been implemented and we are monitoring the results.
monitoring Apr 14, 2025, 02:30 AM UTC

After further monitoring, this incident is now considered resolved. All services have been restored to full functionality. If you continue to experience any issues, please contact Box Support at https://support.box.com.
resolved Apr 14, 2025, 02:30 AM UTC

This incident has been resolved.
postmortem Apr 24, 2025, 12:18 AM UTC

We recently addressed issues affecting the Box Webapp and Public API. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future. Between 4:29 PM PDT and 7:14 PM PDT on April 13, 2025, some users may have experienced difficulties while working in Box. During this time, users may have experienced slowness or occasional errors when interacting with some features in the Box webapp or public API, including Logins, Uploads/Downloads, and Notes. The issue occurred as a result of CPU performance degradation in multiple instances of our relational data access service in a single availability zone. We were able to resolve the issue by performing a rolling restart of the affected instances. In addition, we are working to improve our remediation processes when a single availability zone is affected in order to prevent similar issues from occurring in the future. ‌ **Analysis** During the time of the incident, we detected that several instances of our relational data access service in a single availability zone were experiencing higher-than-expected latencies. Because Box webapp and public API requests depend on this relational data access service, this additional latency impacted the Box webapp and public API. We resolved the issue by performing a rolling restart of the affected instances. However, the rolling restart took longer than desired and impacted latencies for the duration of the rolling restart. We identified corrective actions to more quickly remediate a similar issue in the future by leveraging tooling to divert traffic away from an impacted availability zone. ‌ **Corrective Actions** Box has initiated the following corrective actions: * Improve observability into issues that affect a single availability zone * Improve processes around usage of tooling to safely and quickly divert traffic away from an impacted availability zone * Decrease time to perform restart of relational data access service instances ‌ We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. Sincerely, The Box Team