Cornerstone experienced a major incident on July 17, 2025 affecting Uptime, lasting 1h 36m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jul 17, 2025, 07:58 AM UTC
We are currently observing issues affecting some portals in the US PRD SL3 environment. Our team is actively investigating the issue. During this time, customers may experience exceptions when attempting to access their portals.
- monitoring Jul 17, 2025, 08:22 AM UTC
A fix has been implemented and we are monitoring the results.
- resolved Jul 17, 2025, 09:35 AM UTC
This incident has been resolved.
- postmortem Aug 01, 2025, 10:15 PM UTC
Incident Summary: On July 17th, 2025, Cornerstone engineers were alerted to elevated latency through internal monitoring systems. Impact: Clients hosted in the US SL3 swimlane may have experienced intermittent performance issues on Production when accessing either Learner Home or the Recruiting pages. Root Cause Analysis \(RCA\): The latency was traced to network issues affecting a subset of nodes within the web application cluster. These nodes were causing delays in request processing, impacting the overall responsiveness of key application areas. Corrective Actions: The impacted application nodes were recycled to remove them from active traffic and stabilize performance. As a result, service availability for Learner Home and the Recruiting pages was restored. Preventive Measures: • Proactive health checks and automated isolation of unhealthy nodes are being implemented to prevent similar node-level issues from impacting users. • Enhanced monitoring has been deployed to detect abnormal latency patterns across all application nodes. • A permanent fix is being applied to address the specific network configuration issue identified on the affected nodes. • A cross-functional review is underway to finalize and enforce updated operational runbooks for faster triage and remediation of node-level incidents.