CloudBees experienced a notice incident on July 15, 2020, lasting 12h 28m. The incident has been resolved; the full update timeline is below.
Update timeline
- identified Jul 15, 2020, 11:41 AM UTC
There is an outage with a 3rd party service. We're contacting them to see what's the situation.
- identified Jul 15, 2020, 12:38 PM UTC
We have confirmed the database issue with our service provider and they are working to restore service. Service impact is that experiments can’t be updated, but existing flags are unaffected.
- identified Jul 15, 2020, 01:11 PM UTC
Our service provider has updated us about the situation and the rough estimate for recovery is 2 to 3 hours.
- identified Jul 15, 2020, 02:08 PM UTC
Our service provider announced is going to take longer to recover the system. The current rough estimate for recovery is from 5 to 9 hours.
- identified Jul 15, 2020, 04:02 PM UTC
Monitoring https://status.compose.com/ for further updates
- identified Jul 15, 2020, 09:07 PM UTC
We've recieved this message from our service provider: “At this point we are cautiously optimistic. Our engineers are close to having virtual networking up across all hosts in the cluster. So far so good. Once stable we will start bringing capsules back up.” More updates to follow.
- identified Jul 15, 2020, 10:42 PM UTC
IBM Compose update - "Virtual networking is up across all hosts in the cluster and the situation appears to be stable. We are slowly starting data/member capsules. Once those are up, we will start portals which will restore customer access" In parallel - CloudBees engineering teams are now working to restore the database service to our own infrastructure - with a view to failing over if Compose is not able to restore access in a timely manner.
- identified Jul 16, 2020, 12:05 AM UTC
Our Engineering Team has been successfully restored the database but still not 100% operational. More updates to follow.
- monitoring Jul 16, 2020, 12:40 AM UTC
Apart from Impression analytics - which is currently not working - the service is back to operational. We're monitoring the situation.
- monitoring Jul 16, 2020, 01:18 AM UTC
The Rollout core service (API/login/web) outage has been resolved and these services are now fully operational. However, Impression Analytics are not currently available, the engineering team are working to resolve this issue. We will continue to provide service updates on the status of Impression Analytics until the issue is resolved.
- resolved Jul 16, 2020, 01:39 AM UTC
All CloudBees Rollout (Feature Flags) services are now fully operational. We have not identified any data-loss or security impact from this outage. An outage post-mortem and corrective actions will be performed in due course. Thank you for your patience.