Box incident

[Minor] Issues with Multiple Box Services

Minor Resolved View vendor source →

Box experienced a minor incident on April 14, 2025 affecting Content API and Login/SSO and 1 more component, lasting 2h 6m. The incident has been resolved; the full update timeline is below.

Started
Apr 14, 2025, 12:24 AM UTC
Resolved
Apr 14, 2025, 02:30 AM UTC
Duration
2h 6m
Detected by Pingoru
Apr 14, 2025, 12:24 AM UTC

Affected components

Content APILogin/SSOWeb ApplicationUploads/DownloadsUploads/Downloads

Update timeline

  1. investigating Apr 14, 2025, 12:24 AM UTC

    We are investigating an ongoing issue affecting the Box API, uploads, downloads, logins, and Box Notes. We will provide more information as soon as it is available.

  2. identified Apr 14, 2025, 12:43 AM UTC

    The issue has been identified and a fix is being implemented.

  3. identified Apr 14, 2025, 01:38 AM UTC

    We are continuing to work on a fix for this issue.

  4. monitoring Apr 14, 2025, 02:11 AM UTC

    A fix has been implemented and we are monitoring the results.

  5. monitoring Apr 14, 2025, 02:30 AM UTC

    After further monitoring, this incident is now considered resolved. All services have been restored to full functionality. If you continue to experience any issues, please contact Box Support at https://support.box.com.

  6. resolved Apr 14, 2025, 02:30 AM UTC

    This incident has been resolved.

  7. postmortem Apr 24, 2025, 12:18 AM UTC

    We recently addressed issues affecting the Box Webapp and Public API. We would like to take the opportunity to further explain these issues and the steps we have taken to keep them from happening in the future. Between 4:29 PM PDT and 7:14 PM PDT on April 13, 2025, some users may have experienced difficulties while working in Box. During this time, users may have experienced slowness or occasional errors when interacting with some features in the Box webapp or public API, including Logins, Uploads/Downloads, and Notes. The issue occurred as a result of CPU performance degradation in multiple instances of our relational data access service in a single availability zone. We were able to resolve the issue by performing a rolling restart of the affected instances. In addition, we are working to improve our remediation processes when a single availability zone is affected in order to prevent similar issues from occurring in the future. ‌ **Analysis** During the time of the incident, we detected that several instances of our relational data access service in a single availability zone were experiencing higher-than-expected latencies. Because Box webapp and public API requests depend on this relational data access service, this additional latency impacted the Box webapp and public API. We resolved the issue by performing a rolling restart of the affected instances. However, the rolling restart took longer than desired and impacted latencies for the duration of the rolling restart. We identified corrective actions to more quickly remediate a similar issue in the future by leveraging tooling to divert traffic away from an impacted availability zone. ‌ **Corrective Actions** Box has initiated the following corrective actions: * Improve observability into issues that affect a single availability zone * Improve processes around usage of tooling to safely and quickly divert traffic away from an impacted availability zone * Decrease time to perform restart of relational data access service instances ‌ We are continuously working to improve Box and want to make sure we are delivering the best product and user experience we can. We hope we have provided some clarity here and we would be happy to answer any questions you may still have regarding this matter. Sincerely, The Box Team