Codefresh incident
Partial Outage: Pipeline builds are stuck in pending due to expired certificate's
Codefresh experienced a major incident on August 14, 2024, lasting —. The incident has been resolved; the full update timeline is below.
Update timeline
- resolved Aug 14, 2024, 01:53 AM UTC
We had a small number of hybrid runners (no more than 10) that were unable to communicate with our API for a day, and therefore were unable to fetch and run pipelines. We identified an issue with our certificate rotation which failed to generate new certificates as required for this subset of runners. We were able to resolve the issue by manually recreating the certificates required, which were then updated to the runners on the next build, restoring the service for all impacted customers.
- postmortem Aug 14, 2024, 01:54 AM UTC
**Impact**: We had a 10 hybrid runners \(no more than 10\) that were unable to communicate with our API for a day, and therefore were unable to fetch and run pipelines. **Detection**: We were informed of this issue by customers. **Root Cause**: We identified an issue with our certificate rotation which failed to generate new certificates as required for this subset of runners. **Resolution**: We were able to resolve the issue by manually recreating the certificates required, which were then updated to the runners on the next build, restoring the service for all impacted customers. Further mitigation was done to ensure the issue with certificate rotation was also rectified. We are working on monitoring improvements in this area