Crossref incident

Networking Issues at our Data centre

Crossref experienced a major incident on April 30, 2025 affecting Public REST API and Crossref website and 1 more component, lasting 14h 18m. The incident has been resolved; the full update timeline is below.

Started: Apr 30, 2025, 12:35 PM UTC
Resolved: May 01, 2025, 02:54 AM UTC
Duration: 14h 18m
Detected by Pingoru: Apr 30, 2025, 12:35 PM UTC

Affected components

Public REST APICrossref websitedeliberately-unreliable serverPlus REST APIAdmin toolHandle serversPolite REST APICrossref supportPlus OAI-PMHDemo Auth

Update timeline

investigating Apr 30, 2025, 12:35 PM UTC

We are currently experiencing network problems at our data centre which has caused both our doi and api domains to be inaccessible. We are working on this as a matter of urgency and will post more when we know it.
investigating Apr 30, 2025, 02:11 PM UTC

We are continuing to investigate this issue.
investigating Apr 30, 2025, 03:22 PM UTC

We are continuing to investigate this issue.
identified Apr 30, 2025, 04:00 PM UTC

The issue has been identified and a fix is being implemented.
identified Apr 30, 2025, 06:02 PM UTC

We are continuing to work on a fix for this issue. Any metadata registration attempts sent to us are failing with a network timeout. Unfortunately, that means anything submitted to us during this downtime will need to resubmitted to us when the system is restored. -IF
identified Apr 30, 2025, 08:12 PM UTC

We are continuing to work on a fix for this issue.
identified Apr 30, 2025, 09:41 PM UTC

Our infrastructure team continues to work on a fix for this issue. -IF
monitoring Apr 30, 2025, 11:16 PM UTC

A fix has been implemented and we are monitoring the results.
resolved May 01, 2025, 02:54 AM UTC

This incident has been resolved.
postmortem May 01, 2025, 11:31 PM UTC

### Summary of incident and impact On 30th April at 11:30 UTC, we were alerted by our monitoring tools that our physical data centre was down, meaning all services that go through the data centre were out of action. Essentially all Crossref services were affected: all content registration and helper tools \(web deposit form, record registration form, STQ, etc.\), the REST API, OAI-PMH, reports, and our website. Members who tried to deposit metadata during this time received a network error and will now need to re-try their metadata submissions. Existing DOIs still resolved during this time. Because the REST API already runs in the cloud \(though traffic to it is routed through the data centre first\), we updated the routing for the REST API to bypass the data centre, restoring REST API service at approximately 15:00 UTC. The rest of the services were restored at approximately 23:00 UTC. Updating the routing of the REST API had a knock-on effect of disrupting deposits for our members using the Crossref OJS plugin, beginning when the rest of the services were restored. The issue was resolved with additional routing changes on 1 May at approximately 16:00 UTC. OJS users who use the Crossref plugin and attempted deposits during this time received a failure notification and will need to resubmit. ### Root cause Once our staff arrived at the data centre, we determined the primary firewall hardware had failed. The secondary firewall had also failed previously, but that failure had gone unacknowledged. ### Resolution We obtained and configured a new firewall and restored services. ### Next Steps We’ll obtain additional backup firewalls to have on hand in the event of another failure. We are already in the process of moving all of our services to the cloud and out of the physical data centre, so this incident is a great reminder \(if we needed one!\) of the importance of this project.