Codefresh incident

Partial Outage: Pipeline builds are stuck in pending due to expired certificate's

Major Resolved View vendor source →

Codefresh experienced a major incident on August 14, 2024, lasting —. The incident has been resolved; the full update timeline is below.

Started
Aug 14, 2024, 01:53 AM UTC
Resolved
Aug 01, 2024, 07:00 PM UTC
Duration
Detected by Pingoru
Aug 14, 2024, 01:53 AM UTC

Update timeline

  1. resolved Aug 14, 2024, 01:53 AM UTC

    We had a small number of hybrid runners (no more than 10) that were unable to communicate with our API for a day, and therefore were unable to fetch and run pipelines. We identified an issue with our certificate rotation which failed to generate new certificates as required for this subset of runners. We were able to resolve the issue by manually recreating the certificates required, which were then updated to the runners on the next build, restoring the service for all impacted customers.

  2. postmortem Aug 14, 2024, 01:54 AM UTC

    **Impact**: We had a 10 hybrid runners \(no more than 10\) that were unable to communicate with our API for a day, and therefore were unable to fetch and run pipelines. **Detection**: We were informed of this issue by customers. **Root Cause**: We identified an issue with our certificate rotation which failed to generate new certificates as required for this subset of runners. **Resolution**: We were able to resolve the issue by manually recreating the certificates required, which were then updated to the runners on the next build, restoring the service for all impacted customers. Further mitigation was done to ensure the issue with certificate rotation was also rectified. We are working on monitoring improvements in this area