Jobvite experienced a critical incident on October 9, 2025 affecting ATS - Requisitions and ATS - Candidates and 1 more component, lasting 3h 49m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 09, 2025, 02:02 PM UTC
Jobvite is currenlty experiencing an outage and users may receive 503 errors., Our engineers are actively investigating
- identified Oct 09, 2025, 02:52 PM UTC
The issue has been identified and a fix is being implemented.
- monitoring Oct 09, 2025, 03:06 PM UTC
A fix has been implemented and we are monitoring the results.
- resolved Oct 09, 2025, 05:52 PM UTC
This incident has been resolved.
- postmortem Oct 24, 2025, 06:06 PM UTC
**Incident Date:** October 9, 2025` ` **Start Time:** 9:20 AM ET` ` **End Time:** 10:17 AM ET \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ **Impact Summary** Between 9:20 AM and 10:20 AM ET on October 9, 2025, users experienced 503 errors within the Jobvite application, making it temporarily unavailable \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ **Root Cause** While the system components were healthy, they weren’t receiving traffic. The root cause was identified as a **capacity limit** on our load balancer, which prevented new application instances from registering correctly. \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ **Resolution** The team pinpointed a missing configuration in several services that caused all nodes to be added to the load balancer, exceeding its registration limit. We updated the configuration and manually synced the affected services. By 10:17 AM ET, services were restored and traffic resumed. Registration limits were also increased to ensure future scalability \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ **Preventive Actions** To avoid similar issues in the future, we’re implementing the following: * Automated alerts for service capacity limits. * A runbook for faster log analysis and troubleshooting. * Enhanced monitoring alerts to detect traffic drops in production environments.