Cornerstone Outage History

Cornerstone is up right now

There were 19 Cornerstone outages since February 9, 2026 totaling 146h 40m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.csod.com

Notice April 30, 2026

Resurfaced: Unable to access reports in Reporting 2.0 (US Swimlanes Production Environment)

Detected by Pingoru
Apr 30, 2026, 02:00 PM UTC
Resolved
Apr 30, 2026, 02:35 PM UTC
Duration
34m
Affected: UptimeUptimeUptimeUptimeResponse TimeResponse TimeResponse TimeResponse Time
Timeline · 2 updates
  1. investigating Apr 30, 2026, 02:00 PM UTC

    The Reporting 2.0 issue impacting the US swimlanes has recurred today. Clients within these swimlanes may experience functional disruptions, including difficulties accessing reports and delays while performing certain actions within the portal. As a reminder, this issue was initially reported on 13 April 2026 and is being tracked under the following incident: https://status.csod.com/incidents/m2s13ms05fdp. The Engineering Team has confirmed that the same issue has resurfaced and is actively investigating the cause. This continues to be our top priority, and all relevant teams are engaged to restore normal functionality at the earliest. We sincerely apologize for the inconvenience caused and appreciate your patience. We will keep you informed of any significant updates as they become available.

  2. resolved Apr 30, 2026, 02:35 PM UTC

    The Engineering Team has confirmed that Reporting 2.0 is now functioning as intended. The issue occurred at approximately 6:20 AM PST and was resolved by around 6:53 AM PST. Our teams are actively working on implementing a long-term fix to prevent recurrence. This incident has now been successfully resolved. We sincerely apologize for the inconvenience caused and appreciate your continued patience.

Read the full incident report →

Notice April 27, 2026

Latency/intermittent timeouts observed on CDG SL1 Production Environment

Detected by Pingoru
Apr 27, 2026, 11:21 AM UTC
Resolved
Apr 27, 2026, 11:21 AM UTC
Duration
Timeline · 1 update
  1. resolved Apr 27, 2026, 01:11 PM UTC

    CSOD Technology Team observed a service disruption on the CDG SL1 beginning at 4:21 AM PT, with service restored at 4:31 AM PT. During this time, some clients may have experienced intermittent timeouts. The issue has been resolved and the environment is stable.

Read the full incident report →

Notice April 23, 2026

Intermittent Timeouts on US SL4

Detected by Pingoru
Apr 23, 2026, 05:41 PM UTC
Resolved
Apr 23, 2026, 09:00 AM UTC
Duration
Timeline · 1 update
  1. resolved Apr 23, 2026, 05:41 PM UTC

    CSOD Technology Team observed a service disruption on the US SL4 swimlane beginning at 4:55 AM PT, with service restored at 6:16 AM PT. During this time, some clients may have experienced intermittent timeouts. The issue has been resolved and the environment is stable.

Read the full incident report →

Minor April 13, 2026

Unable to access reports in Reporting 2.0 (US Swimlanes Production Environment)

Detected by Pingoru
Apr 13, 2026, 02:34 PM UTC
Resolved
Apr 13, 2026, 06:40 PM UTC
Duration
4h 6m
Affected: Response TimeResponse TimeResponse TimeResponse Time
Timeline · 4 updates
  1. investigating Apr 13, 2026, 02:34 PM UTC

    US swimlanes are currently experiencing functional issues, including difficulties accessing reports. Clients with portals in this swimlane may encounter delays or disruptions while performing certain tasks. This issue is our top priority, and our teams are actively working toward a resolution. We appreciate your patience and recommend checking back periodically for updates as they become available.

  2. monitoring Apr 13, 2026, 02:43 PM UTC

    Our Engineering teams have identified the issue and deployed a fix to address it. We will continue to closely monitor the situation to ensure stability. Further updates will be shared through the Known Issue article and Statuspage. Thank you for your understanding and patience.

  3. resolved Apr 13, 2026, 06:40 PM UTC

    After a period of monitoring and no recurrences, we are considering this resolved.

  4. postmortem Apr 28, 2026, 05:14 PM UTC

    **Incident Summary:** Starting April 13th, 2026, intermittent errors were observed in reporting functionality across US Production swimlanes \(SL1, SL2, SL3, and SL5\). **Root Cause Analysis \(RCA\):** The issue was caused by the auto-scaling mechanism. Newly provisioned nodes were not fully ready to handle reporting requests, leading to processing delays and request timeouts. **Corrective Action:** Auto-scaling \(scale-out\) has been temporarily paused. The system is currently operating on a fixed, increased server capacity without scale-in or scale-out to ensure stability. **Preventive Actions:** * Capacity has been proactively increased to handle expected load without relying on auto-scaling. * Additional nodes are being utilized with auto-scaling disabled to maintain consistent performance. * Engineering teams are working on enhancements to improve node readiness and ensure seamless scaling in the future.

Read the full incident report →

Notice March 30, 2026

Intermittent Errors on Manage Candidates Page - US Swimlanes

Detected by Pingoru
Mar 30, 2026, 07:38 PM UTC
Resolved
Mar 30, 2026, 11:18 PM UTC
Duration
3h 39m
Affected: Response TimeResponse TimeResponse TimeResponse Time
Timeline · 3 updates
  1. investigating Mar 30, 2026, 07:38 PM UTC

    Some customers on US swimlanes (US SL1, US SL2, US SL3, US SL5) may be experiencing intermittent errors on the Manage Candidates page. Our team is actively investigating this as a top priority and will provide updates as more information becomes available.

  2. monitoring Mar 30, 2026, 08:28 PM UTC

    Our engineering teams have implemented a fix and we have recovered. We will monitor for at least an hour before marking this incident resolved.

  3. resolved Mar 30, 2026, 11:18 PM UTC

    The CSOD Technology Team observed a performance degradation affecting this swimlane. The problem began at 11:45 AM Pacific Time and was resolved at 12:55 PM Pacific Time. During this time, clients with portals on this swimlane may have experienced slow performance or intermittent errors while accessing the Manage Candidates page. After a period of monitoring we are considering this issue Resolved.

Read the full incident report →

Minor March 26, 2026

Emergency Maintenance - Edge Import & Bulk API - US Swimlanes

Detected by Pingoru
Mar 26, 2026, 06:04 PM UTC
Resolved
Mar 26, 2026, 09:29 PM UTC
Duration
3h 25m
Affected: Response TimeResponse TimeResponse TimeResponse Time
Timeline · 4 updates
  1. identified Mar 26, 2026, 06:04 PM UTC

    We are aware of an issue causing some Edge Import loads and feeds to fail during processing on US swimlanes. To address this, we will be initiating an emergency maintenance shortly. We are allowing all currently processing jobs to complete before maintenance begins. In the meantime, new Edge Import and Bulk API jobs are being temporarily prevented from entering the queue. Once maintenance is complete, new jobs and any impacted scheduled jobs will resume processing. We will continue to provide updates as progress is made.

  2. identified Mar 26, 2026, 07:31 PM UTC

    We are starting the emergency maintenance. During this period, you may encounter a 503 error while trying to access Edge Import. We will continue to provide updates as progress is made.

  3. resolved Mar 26, 2026, 09:29 PM UTC

    Emergency maintenance has been completed and everything is operational at this time.

  4. postmortem Apr 13, 2026, 06:45 PM UTC

    **Issue Summary:** On March 22, 2026, Edge import–related errors were observed across the US Production swimlanes \(US SL1, US SL2, US SL3, US SL5\). **Root Cause:** The issue was caused by instability within the underlying cluster infrastructure, which disrupted queue services and prevented message processing, resulting in Edge import failures across the affected swimlanes. **Corrective Action:** On March 23, 2026, emergency maintenance was performed to stabilize and reset the impacted cluster components. Once stabilized, queue processing resumed and pending Edge imports were successfully completed. **Preventive Measures:** To minimize the risk of recurrence, the cluster infrastructure was thoroughly analyzed and validated as stable following remediation.

Read the full incident report →

Notice March 19, 2026

System latency observed in US Swimlanes Production Environment

Detected by Pingoru
Mar 19, 2026, 03:05 PM UTC
Resolved
Mar 20, 2026, 03:05 PM UTC
Duration
1d
Timeline · 2 updates
  1. resolved Mar 20, 2026, 04:18 PM UTC

    The CSOD Technology Team observed a performance degradation affecting US Swimlanes Production Environment (Start time 8:05 am PST to End time 8:15 am PST). During this time, clients with portals on these swimlanes may have experienced slow performance or intermittent errors while accessing the application. Our Engineering teams have implemented a fix and after a period of monitoring we are considering this issue resolved.

  2. postmortem Apr 21, 2026, 06:17 PM UTC

    **Incident Summary:** Since March 11th, 2026, clients have been experiencing intermittent timeouts while accessing portals across US Swimlanes \(US SL1/2/3/5\). **Root Cause Analysis \(RCA\):** The issue was attributed to congestion in the network layer, resulting in packet drops and intermittent request failures. Increased traffic load exposed capacity and routing inefficiencies across network and application tiers. ‌ **Progressive Updates & Actions Taken:** **March 11th, 2026:** * Initial issue observed with intermittent timeouts across multiple swimlanes. * Preliminary analysis indicated packet drops due to network congestion. **March 19th, 2026:** * Tune health checks and optimize scaling behavior to improve responsiveness during sudden traffic spikes. * Introduce scheduled scaling to align capacity with peak and off-hours demand. **March 26th, 2026:** * Increased number of gateway instances to distribute traffic load more effectively. * Initial improvement observed in request handling capacity. **April 1st, 2026:** * Network device configurations were refreshed to stabilize traffic flow. * Reduced packet drops observed post configuration updates. **April 7th, 2026:** * Routing updates were implemented for specific high-volume API calls to optimize traffic paths. * Latency improvements observed for targeted request flows. **April 11th, 2026:** * Network layer capacity was enhanced by adding additional servers to better handle incoming traffic. * Improved stability and reduced timeout occurrences across application tier. **Preventive Actions/Ongoing Improvements:** * Continue horizontal scaling of gateway and application components based on traffic patterns. * Periodic review and optimization of network routing and configurations. * Ongoing infrastructure upgrades to significantly improve overall throughput and resiliency. **Resolution Summary \(Post April 11th, 2026\):** Over the weekend, capacity across key network components was further enhanced after confirming congestion-related packet drops. Additionally, required registration updates and configuration changes were implemented to optimize traffic handling and improve request reliability across the network. Following these changes, system stability improved significantly, with no widespread timeout patterns observed.

Read the full incident report →

Minor March 16, 2026

Minor Issue - Latency for all Galaxy Clients across UK and EU Region

Detected by Pingoru
Mar 16, 2026, 09:07 AM UTC
Resolved
Mar 16, 2026, 12:06 PM UTC
Duration
2h 58m
Affected: UptimeUptimeUptimeUptimeUptimeUptimeUptimeResponse TimeResponse TimeResponse TimeResponse TimeResponse TimeResponse TimeResponse Time
Timeline · 4 updates
  1. investigating Mar 16, 2026, 09:07 AM UTC

    The Galaxy clients across UK and EU Region Swimlanes are experiencing some latency issues. Clients with portals in this swimlane may experience delays or 504 gateway timeouts errors while accessing certain pages in the application. This is our top priority and we are working to resolve the problem as soon as possible. Please check back periodically for additional updates, which will be posted as they become available.

  2. identified Mar 16, 2026, 09:42 AM UTC

    The issue has been identified and a fix is being implemented.

  3. resolved Mar 16, 2026, 12:06 PM UTC

    The CSOD Technology Team observed a performance degradation affecting the Galaxy clients across UK and EU region swimlane. The problem began at 1:02 AM Pacific Time and was resolved at 1:19 AM Pacific Time. During this time, clients with portals on UK and EU region swimlane may have experienced slow performance or intermittent errors while accessing the application.

  4. postmortem Mar 31, 2026, 05:45 AM UTC

    **Incident Summary:** On March 16th, 2026, clients in the Europe PRD \(FRA SL1, CDG SL1, UK SL1, SL2, SL3\) region experienced a service disruption impacting Cornerstone, resulting in login failures and access issues. **Impact:** Users attempting to access Cornerstone were unable to log in and encountered 504 Gateway Timeout errors. The issue affected authentication flows and prevented successful access to the platform during the incident period. **Root Cause Analysis \(RCA\):** The disruption was caused by latency in the authentication service responsible for handling login requests. Cornerstone relies on a centralized authentication mechanism, and during the incident, authentication requests experienced delays, leading to timeouts at the gateway layer. This resulted in failed login attempts and temporary service unavailability. **Resolution:** Upon identification, the engineering team took corrective actions to stabilize the authentication flow and restore normal response times. System performance was closely monitored, and once latency was reduced, login functionality was fully restored across the impacted environments. **Preventive Measures:** The following measures have been implemented to reduce the likelihood of recurrence: * Enhanced monitoring and alerting for authentication service latency * Reviewed and adjusted timeout thresholds to improve resilience in the request flow * Strengthened capacity planning for authentication related services

Read the full incident report →

Notice March 4, 2026

Minor Issue - Latency - FRA SL1

Detected by Pingoru
Mar 04, 2026, 09:00 AM UTC
Resolved
Mar 04, 2026, 09:00 AM UTC
Duration
Timeline · 2 updates
  1. resolved Mar 04, 2026, 12:22 PM UTC

    The CSOD Technology Team observed a performance degradation affecting the FRA SL1 swimlane. The problem began at 1:30 AM Pacific Time and was resolved at 2:05 AM Pacific Time. During this time, clients with portals on FRA SL1 swimlane may have experienced slow performance or intermittent errors while accessing the application.

  2. postmortem Mar 23, 2026, 07:58 PM UTC

    **Issue Summary:** On March 4, 2026, users experienced intermittent latency while accessing the platform hosted in the FRA PRD SL1 environment. During this period, some requests took longer than expected to process, resulting in degraded performance across the application. The issue impacted multiple Production environments but did not result in complete service unavailability. **Root Cause:** The latency was caused by stability issues within an underlying authentication service responsible for user authentication and session management. Increased authentication activity, partly driven by the enablement of global SSO capabilities for multiple customers, placed additional load on the service. This resulted in intermittent service instability and elevated request processing times across the affected environments. **Corrective Action:** Cornerstone Engineering teams implemented the following actions to restore service stability: * Increased authentication service capacity to reduce immediate load and stabilize the authentication infrastructure. * Conducted a detailed review of the service implementation to identify inefficiencies. * Optimized the service by removing unnecessary code references and improving memory management. Following these actions, the authentication service stabilized and platform latency returned to normal levels. **Preventive Measures:** To reduce the risk of recurrence, Cornerstone has taken the following actions: * Increased authentication service capacity to better support growing authentication demand. * Implemented code and memory management optimizations to improve service stability and efficiency. * Introduced enhanced monitoring to provide greater visibility into authentication service performance and detect anomalies earlier.

Read the full incident report →

Notice March 4, 2026

Intermittent Latency and Errors Across Application - US Swimlanes

Detected by Pingoru
Mar 04, 2026, 01:30 AM UTC
Resolved
Mar 07, 2026, 05:01 AM UTC
Duration
3d 3h
Affected: Response TimeResponse TimeResponse TimeResponse Time
Timeline · 15 updates
  1. investigating Mar 03, 2026, 03:59 PM UTC

    Some customers on a US swimlane may experience latency and errors intermittently across the application. We are actively investigating the issue as our top priority. We will share regular updates as they become available.

  2. monitoring Mar 03, 2026, 04:56 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. monitoring Mar 03, 2026, 07:44 PM UTC

    We are maintaining the status in Monitoring while we continue to ensure stability is sustained. We will continue to provide updates until the issue is considered fully resolved.

  4. identified Mar 03, 2026, 08:27 PM UTC

    We are currently observing intermittent latency spikes. The root cause has been identified, and our teams are actively working on a remediation plan. We are targeting deployment of the fix during non-business hours to minimize further customer impact. We will continue to provide updates as more information becomes available.

  5. identified Mar 03, 2026, 09:51 PM UTC

    Some customers may continue to experience intermittent latency spikes. Our teams remain actively engaged and are making progress on remediation efforts. Deployment of the proposed fix remains scheduled for non-business hours to minimize any further customer impact. We will continue to share regular updates.

  6. identified Mar 03, 2026, 11:46 PM UTC

    Some customers may continue to experience intermittent latency spikes. Our teams remain actively engaged and are making progress on remediation efforts. Deployment of the proposed fix remains scheduled for non-business hours to minimize any further customer impact. We will continue to share regular updates.

  7. identified Mar 04, 2026, 01:35 AM UTC

    Some customers may continue to experience intermittent latency spikes. Our teams remain actively engaged and are making progress on remediation efforts. Deployment of the proposed fix remains scheduled for non-business hours to minimize any further customer impact. We will continue to share regular updates.

  8. identified Mar 04, 2026, 05:41 AM UTC

    Some customers may continue to experience intermittent latency spikes. Our teams remain actively engaged and are making progress on remediation efforts. Deployment of the proposed fix remains scheduled for non-business hours to minimize any further customer impact. We will continue to share regular updates.

  9. identified Mar 04, 2026, 08:16 AM UTC

    Some customers may continue to experience intermittent latency spikes. Our teams remain actively engaged and are making progress on remediation efforts. Deployment of the proposed fix remains scheduled for non-business hours to minimize any further customer impact. We will continue to share regular updates.

  10. identified Mar 04, 2026, 10:34 AM UTC

    Some customers may continue to experience intermittent latency spikes. Our teams remain actively engaged and are making progress on remediation efforts. Deployment of the proposed fix remains scheduled for non-business hours to minimize any further customer impact. We will continue to share regular updates.

  11. monitoring Mar 04, 2026, 01:01 PM UTC

    A fix for this issue was deployed successfully. We are monitoring the swimalanes before concluding this incident as resolved.

  12. identified Mar 04, 2026, 06:43 PM UTC

    We observed brief latency spikes this morning. Our teams are actively working on a permanent fix to fully address the issue. We will continue to provide regular updates until it is resolved. Thank you for your patience.

  13. monitoring Mar 04, 2026, 08:36 PM UTC

    We observed a brief spike in latency between 11:00 AM and 11:15 AM PT. Mitigation measures have been implemented to help minimize intermittent performance degradation while a permanent fix is being developed. We will continue to share regular updates as progress is made.

  14. resolved Mar 07, 2026, 05:01 AM UTC

    The CSOD Technology Team observed a performance degradation affecting this swimlane. During this time, clients with portals on this swimlane may have experienced slow performance or intermittent errors while accessing the application. Our Engineering teams have implemented a fix and after a period of monitoring we are considering this issue resolved.

  15. postmortem Mar 15, 2026, 07:09 PM UTC

    **Issue Summary:** Between February 26 and March 12, 2026, users experienced intermittent latency while accessing the platform hosted in the US SL1, SL2, SL3, and SL5 environments. During this period, some requests took longer than expected to process, resulting in degraded performance across the application. The issue impacted multiple swimlanes but did not result in complete service unavailability. **Root Cause:** The latency was caused by stability issues within an underlying authentication service responsible for user authentication and session management. Increased authentication activity, partly driven by the enablement of global SSO capabilities for multiple customers, placed additional load on the service. This resulted in intermittent service instability and elevated request processing times across the affected environments. **Corrective Action:** Cornerstone Engineering teams implemented the following actions to restore service stability: * Increased authentication service capacity to reduce immediate load and stabilize the authentication infrastructure. * Conducted a detailed review of the service implementation to identify inefficiencies. * Optimized the service by removing unnecessary code references and improving memory management. Following these actions, the authentication service stabilized and platform latency returned to normal levels. **Preventive Measures:** To reduce the risk of recurrence, Cornerstone has taken the following actions: * Increased authentication service capacity to better support growing authentication demand. * Implemented code and memory management optimizations to improve service stability and efficiency. * Introduced enhanced monitoring to provide greater visibility into authentication service performance and detect anomalies earlier.

Read the full incident report →

Notice March 3, 2026

Latency on FRA SL1 Production

Detected by Pingoru
Mar 03, 2026, 12:11 PM UTC
Resolved
Mar 03, 2026, 09:00 AM UTC
Duration
Timeline · 2 updates
  1. resolved Mar 03, 2026, 12:11 PM UTC

    On 3/3/2026 between 1:00 AM and 2:30 AM PST we experienced latency in FRA SL1. During this time portals may have experienced issues accessing or using the platform.

  2. postmortem Mar 14, 2026, 08:31 PM UTC

    **Issue Summary:** On March 3, 2026, users experienced intermittent latency while navigating the portal hosted in the FRA SL1 environment. During the incident window, some users encountered slow responses and occasional service unavailable errors while accessing the portal. **Root Cause:** The incident was caused by multiple long-running database sessions executing the same query concurrently. The increased database resource utilization degraded database performance, which propagated latency to the application layer and impacted portal responsiveness. **Corrective Action:** Cornerstone Engineering teams terminated the long-running database sessions causing resource contention and updated database statistics to improve query optimization and execution efficiency. Database replication settings were temporarily adjusted to reduce commit latency while the environment stabilized. System performance was monitored to confirm that database operations and portal access returned to normal. **Preventive Measures:** To reduce the risk of recurrence, Cornerstone is taking the following actions: * Reviewing and optimizing the identified query to prevent prolonged execution. * Enhancing monitoring to proactively detect long-running database sessions and enable faster intervention.

Read the full incident report →

Notice March 2, 2026

Minor Issue - Careers Site Down - US SL2

Detected by Pingoru
Mar 02, 2026, 06:53 PM UTC
Resolved
Mar 02, 2026, 06:42 PM UTC
Duration
Timeline · 2 updates
  1. resolved Mar 02, 2026, 06:53 PM UTC

    The US SL2 experienced issues with the Careers site. The issue was identified and has been resolved. Services have now returned to normal. We apologize for any inconvenience this may have caused. The issues began at 09:52 a.m. PT and was resolved at 10:31 a.m. PT

  2. postmortem Mar 14, 2026, 06:58 PM UTC

    **Issue Summary:** On March 2, 2026, clients hosted in the US SL2 Production environment experienced a temporary service disruption affecting the Career Site. During the incident window, users encountered intermittent slowness and brief periods of unavailability while accessing the Career Site. **Root Cause:** The incident was caused by an unexpected surge in traffic to the Career Site during off-hours. The elevated request volume temporarily exceeded available application capacity, resulting in service instability. **Corrective Action:** Cornerstone Engineering teams increased application capacity to stabilize performance and restore normal service levels. System performance was monitored following the adjustment to confirm that service stability had been fully restored. **Preventive Measures:** To reduce the risk of recurrence, Cornerstone is implementing the following improvements: * Enhancing monitoring thresholds to detect sudden traffic spikes earlier and enable faster response. * Reviewing and adjusting capacity planning to better accommodate unexpected increases in traffic.

Read the full incident report →

Notice February 28, 2026

Service Disruption: US SL1 PRD - Service Unavailable Getting SSL Error

Detected by Pingoru
Feb 28, 2026, 06:43 PM UTC
Resolved
Mar 01, 2026, 01:59 AM UTC
Duration
7h 15m
Affected: Uptime
Timeline · 5 updates
  1. investigating Feb 28, 2026, 06:43 PM UTC

    This swimlane is experiencing a service disruption. This is our top priority and we are working to resolve the problem as soon as possible. Please check back periodically for additional updates, which will be posted as they become available. Start Time: 9:30 AM PST

  2. investigating Feb 28, 2026, 08:18 PM UTC

    We are continuing to investigate this issue.

  3. monitoring Feb 28, 2026, 08:50 PM UTC

    The issue has been identified and resolved, and services have now been restored to normal operations. We will continue to monitor this closely.

  4. resolved Mar 01, 2026, 01:59 AM UTC

    After careful monitoring, the issue has now been resolved.

  5. postmortem Mar 14, 2026, 09:00 PM UTC

    **Issue Summary:** On February 28, 2026, users experienced intermittent service availability while accessing the portal hosted in the US SL1 Production environment. During the incident window, some users encountered intermittent access issues when navigating the portal. **Root Cause:** The incident was caused by a configuration limit being reached within the underlying infrastructure service. Once the limit was reached, normal traffic handling was temporarily impacted, which resulted in intermittent portal accessibility issues. **Corrective Action:** Cornerstone Engineering teams identified the configuration limit and implemented adjustments to restore normal service operation. The environment was monitored following the change to confirm stability and ensure that portal access had returned to normal. **Preventive Measures:** To reduce the risk of recurrence, Cornerstone has taken the following actions: * Increased the relevant infrastructure service limit to support higher capacity and prevent similar constraints from affecting portal accessibility in the future.

Read the full incident report →

Notice February 26, 2026

Latency observed in CSX for US Swimlanes Production Environment

Detected by Pingoru
Feb 26, 2026, 03:51 PM UTC
Resolved
Feb 26, 2026, 03:00 PM UTC
Duration
Timeline · 2 updates
  1. resolved Feb 26, 2026, 03:51 PM UTC

    The CSOD Technology Team observed a performance degradation affecting all US swimlanes (Production Environment). The problem began at 6.10 am Pacific Time and was resolved at 6.29 am Pacific Time. During this time, clients with portals on these swimlanes may have experienced slow performance or intermittent errors while accessing the application. The issue has been identified and resolved, and services have been restored to normal operations. Thank you for your patience and understanding!

  2. postmortem Mar 15, 2026, 06:58 PM UTC

    **Issue Summary:** Between February 26 and March 12, 2026, users experienced intermittent latency while accessing the platform hosted in the US SL1, SL2, SL3, and SL5 environments. During this period, some requests took longer than expected to process, resulting in degraded performance across the application. The issue impacted multiple swimlanes but did not result in complete service unavailability. **Root Cause:** The latency was caused by stability issues within an underlying authentication service responsible for user authentication and session management. Increased authentication activity, partly driven by the enablement of global SSO capabilities for multiple customers, placed additional load on the service. This resulted in intermittent service instability and elevated request processing times across the affected environments. **Corrective Action:** Cornerstone Engineering teams implemented the following actions to restore service stability: * Increased authentication service capacity to reduce immediate load and stabilize the authentication infrastructure. * Conducted a detailed review of the service implementation to identify inefficiencies. * Optimized the service by removing unnecessary code references and improving memory management. Following these actions, the authentication service stabilized and platform latency returned to normal levels. **Preventive Measures:** To reduce the risk of recurrence, Cornerstone has taken the following actions: * Increased authentication service capacity to better support growing authentication demand. * Implemented code and memory management optimizations to improve service stability and efficiency. * Introduced enhanced monitoring to provide greater visibility into authentication service performance and detect anomalies earlier.

Read the full incident report →

Notice February 24, 2026

Latency Observed on LMS Pages Across EU Production (Galaxy Customers)

Detected by Pingoru
Feb 24, 2026, 01:28 PM UTC
Resolved
Feb 24, 2026, 07:24 PM UTC
Duration
5h 55m
Affected: UptimeUptimeUptimeUptimeUptimeUptimeResponse TimeResponse TimeResponse TimeResponse TimeResponse TimeResponse Time
Timeline · 5 updates
  1. investigating Feb 24, 2026, 01:28 PM UTC

    We are currently investigating reports of increased latency affecting EU region Production Environment. This issue impacts Galaxy customers using LXP and LMS, or LXP alone, and may result in slower page load times and eventually erroring out. Our Engineering team is investigating on the root cause and is actively working to restore normal response times. We will continue to share updates as progress is made. Thank you for your patience and understanding.

  2. identified Feb 24, 2026, 01:44 PM UTC

    Our Engineering team has identified the root cause and is actively working to restore normal response times. We will continue to share updates as progress is made.

  3. monitoring Feb 24, 2026, 02:05 PM UTC

    The issue has been identified and resolved, and services have been restored to normal operations. We will continue to monitor the situation closely.

  4. resolved Feb 24, 2026, 07:24 PM UTC

    After a period of monitoring with no further recurrence observed, the issue is now considered resolved. The CSOD Technology Team identified a performance degradation impacting EU Production (Galaxy Customers). The issue began at 4:28 AM Pacific Time and was resolved at 5:19 AM Pacific Time. During this period, clients with portals in this region may have experienced slow performance or intermittent errors while accessing/working on the application.

  5. postmortem Mar 13, 2026, 07:31 PM UTC

    **Issue Summary:** On February 24, 2026, elevated latency was observed in the EU region, resulting in degraded platform performance. During the affected period, users experienced slower response times while accessing the platform, particularly for operations within the Learning Experience Platform \(LXP\). **Root Cause:** The incident was caused by performance degradation in an interdependent backend service. The slowdown in this service propagated to dependent platform components, resulting in increased latency and delayed processing of LXP operations in the EU region. **Corrective Action:** As an immediate mitigation, Cloud Operations collaborated with the development team to implement temporary measures to reduce the latency impact and stabilize platform performance. Engineering teams have also prepared a permanent code fix to address the underlying issue. **Preventive Measures:** To reduce the likelihood of a similar incident in the future, the following fix is being implemented: * Improved the service logic to better handle certain user validation scenarios, preventing unnecessary retries that previously contributed to increased latency.

Read the full incident report →

Minor February 23, 2026

Latency Observed on LMS Pages Across All US Swimlanes (Galaxy Customers Only)

Detected by Pingoru
Feb 23, 2026, 04:24 PM UTC
Resolved
Feb 23, 2026, 10:23 PM UTC
Duration
5h 59m
Affected: UptimeUptimeUptimeUptimeResponse TimeResponse TimeResponse TimeResponse Time
Timeline · 4 updates
  1. identified Feb 23, 2026, 04:24 PM UTC

    We are currently investigating reports of increased latency affecting all US swimlane environments (Pilot, Sandbox, and Production). This issue impacts Galaxy customers using LXP and LMS, or LXP alone, and may result in slower page load times. Our Engineering team has identified the root cause and is actively working to restore normal response times. We will continue to share updates as progress is made. Thank you for your patience and understanding.

  2. monitoring Feb 23, 2026, 06:08 PM UTC

    The issue has been identified and resolved, and services have been restored to normal operations. We will continue to monitor the situation closely.

  3. resolved Feb 23, 2026, 10:23 PM UTC

    The CSOD Technology Team observed a performance degradation affecting this swimlane. The problem began at 7:54 AM Pacific Time and was resolved at 10:48 AM Pacific Time. During this time, clients with portals on this swimlane may have experienced slow performance or intermittent errors while accessing the application.

  4. postmortem Mar 12, 2026, 08:24 PM UTC

    **Issue Summary:** On February 23, 2026, increased latency was observed across all U.S. swimlane environments due to an issue with an internal service that depends on a third-party database service \(ArangoDB\). The incident affected Galaxy customers using LXP and LMS, including environments running LXP and LMS together, or LXP alone. During the affected period, users experienced slower page load times and delayed system responses. **Root Cause:** The incident was caused by a disruption in the third-party ArangoDB service used by an internal platform component. The dependency caused the internal service to experience latency, which propagated to all US swimlane environments and resulted in broader platform responsiveness degradation. **Corrective Action:** Engineering teams collaborated with the ArangoDB vendor to identify and address the service disruption. Once the vendor restored normal operation of the database service, the dependent internal services recovered and system performance returned to expected levels. **Preventive Measures:** To reduce the likelihood of a similar incident in the future, the following improvement is being implemented: * Implement Circuit Breaker: Introducing circuit breaker within internal services to prevent dependency failures from propagating and causing broader platform latency.

Read the full incident report →

Critical February 17, 2026

Service Disruption: CDG SL1 : 'Service Unavailable' Error while accessing/working on the platform

Detected by Pingoru
Feb 17, 2026, 02:17 PM UTC
Resolved
Feb 17, 2026, 05:39 PM UTC
Duration
3h 22m
Affected: UptimeResponse Time
Timeline · 5 updates
  1. investigating Feb 17, 2026, 02:17 PM UTC

    The CDG SL1 Swimlane is experiencing a service disruption. Users may witness the error: Service Unavailable HTTP Error 503. The service is unavailable. This is our top priority and we are working to resolve the problem as soon as possible. Please check back periodically for additional updates, which will be posted as they become available.

  2. investigating Feb 17, 2026, 02:19 PM UTC

    We are continuing to investigate this issue.

  3. monitoring Feb 17, 2026, 02:35 PM UTC

    The issue has been identified and resolved, and services have now been restored to normal operations. We will continue to monitor this closely.

  4. resolved Feb 17, 2026, 05:39 PM UTC

    After careful monitoring, the issue has now been resolved. The CSOD Technology Team identified a service disruption impacting the CDG SL1 swimlane, which was successfully restored as of 06:19 AM Pacific Time today. A full Root Cause Analysis (RCA), including preventive measures, will be published on the Status Page within 7–10 business days. Thank you for your patience and understanding.

  5. postmortem Mar 06, 2026, 08:22 AM UTC

    **Incident Summary:** On February 17th, 2026, users were unable to log in to the portal hosted in the CDG SL1 \(AWS\) environment. During the impact window users encountered Service Unavailable errors. **Impact:** During the incident window, portal login functionality was temporarily unavailable. Users attempting to access the portal received service unavailable errors. The issue was limited to the affected environment. **Root Cause Analysis \(RCA\):** The disruption was caused by multiple long running database sessions executing the same query concurrently. The increased database resource utilization degraded overall performance, which in turn caused the application layer to return errors. **Resolution:** Upon identification of the issue, the engineering team: * Terminated the long-running database sessions causing resource contention and updated database statistics to improve query optimization and execution efficiency. * Temporarily adjusted database replication settings to reduce commit latency during stabilization. * Closely monitored system resource utilization to confirm normalization. Following these actions, database performance stabilized and portal access was fully restored. **Preventive Measures:** To reduce the likelihood of recurrence, the following measures are being implemented: * Reviewing and optimizing the identified query to prevent prolonged execution * Enhancing monitoring to proactively detect long running database sessions

Read the full incident report →

Notice February 9, 2026

Career Site Returning an Error (February 9th) - US SLs

Detected by Pingoru
Feb 09, 2026, 04:03 PM UTC
Resolved
Feb 09, 2026, 05:01 PM UTC
Duration
57m
Affected: UptimeUptimeUptimeUptime
Timeline · 3 updates
  1. monitoring Feb 09, 2026, 04:03 PM UTC

    The errors have subsided, and no further issues have been observed. We will remain vigilant and continue monitoring.

  2. resolved Feb 09, 2026, 05:01 PM UTC

    This incident has been resolved.

  3. postmortem Feb 21, 2026, 12:23 AM UTC

    **Incident Summary:** On February 9th, 2026, clients hosted in the US Production environments \(SL1, SL2, SL3, and SL5\) experienced a temporary service disruption affecting the Career Site. **Impact:** During the incident window, users experienced intermittent slowness and brief periods of unavailability while accessing the Career Site. The impact was limited to elevated traffic conditions during off-peak hours. **Root Cause Analysis \(RCA\):** The disruption was caused by an unexpected surge in traffic to the Career Site during off-hours. The elevated request volume temporarily exceeded the available application capacity, which resulted in service instability. **Resolution:** Upon identification of the issue, Cornerstone engineers promptly increased application capacity to stabilize performance and restore normal service levels. Platform performance was continuously monitored to ensure full recovery and sustained stability. **Preventive Measures:** To reduce the likelihood of recurrence, Cornerstone is implementing the following improvements: Adjusting auto-scaling schedules to better accommodate traffic patterns during off-hours Enhancing monitoring thresholds to proactively respond to sudden traffic spikes Reviewing capacity planning assumptions to ensure sufficient headroom during peak and off-peak periods

Read the full incident report →

Looking to track Cornerstone downtime and outages?

Pingoru polls Cornerstone's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when Cornerstone reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track Cornerstone alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring Cornerstone for free

5 free monitors · No credit card required