Sirv incident

Some uploads failing

Minor Resolved View vendor source →

Sirv experienced a minor incident on March 15, 2025 affecting Current status by service (Sirv primary datacenter), lasting 21h 29m. The incident has been resolved; the full update timeline is below.

Started
Mar 15, 2025, 05:56 PM UTC
Resolved
Mar 16, 2025, 03:25 PM UTC
Duration
21h 29m
Detected by Pingoru
Mar 15, 2025, 05:56 PM UTC

Affected components

Current status by service (Sirv primary datacenter)

Update timeline

  1. investigating Mar 15, 2025, 05:56 PM UTC

    Some file uploads are not completing due to a server cluster issue. The cause is being investigated. The issue is affecting a small number of accounts. If your file upload fails, please wait until this issue has been resolved.

  2. resolved Mar 16, 2025, 03:25 PM UTC

    The issue has been resolved. The issue was caused by two servers in one cluster failing simultaneously. This impacted the uploading of new files to about 20% of Sirv accounts. Uploads are designed to continue as normal when one server is down but when two servers are down, some uploads can fail, which is what happened. They failed due to an exporter service that stopped running on one server, then the other server. Errors prevented the services from restarting. Then a second issue caused the resolution to take much longer than expected because the servers couldn't be rebooted. This was due to an outdated BIOS. Once the BIOS had been updated, the servers were rebooted and the cause of the underlying issue was resolved. To prevent this from happening again, we have implemented a new BIOS management process with our datacenter. We are also shortening our hardware refresh cycle, for more frequent hardware upgrades. To reduce the chance of the exporter service failure from recurring, we have disabled a process and are monitoring the server metrics.