Qubole incident

Site unavailable - gcp.qubole.com

Critical Resolved View vendor source →

Qubole experienced a critical incident on January 28, 2022 affecting Site Availability, lasting 1d 6h. The incident has been resolved; the full update timeline is below.

Started
Jan 28, 2022, 03:49 AM UTC
Resolved
Jan 29, 2022, 10:39 AM UTC
Duration
1d 6h
Detected by Pingoru
Jan 28, 2022, 03:49 AM UTC

Affected components

Site Availability

Update timeline

  1. investigating Jan 28, 2022, 03:49 AM UTC

    Currently gcp.qubole.com environment web page is not loading. We are actively working on this issue.

  2. identified Jan 28, 2022, 05:53 AM UTC

    Devops has identified some issue with certificate update. Currently they are working on it.

  3. identified Jan 28, 2022, 09:23 AM UTC

    DevOps has identified the issue with the Certificate update and also found the location of the old cert to revert it. The team is actively analyzing and working on it to revert it back.

  4. identified Jan 28, 2022, 11:47 AM UTC

    DevOps is still trying to find the validity of the TLS certificate due to which UI is picking up the old certificate, Hence failing to load

  5. identified Jan 28, 2022, 03:51 PM UTC

    DevOps has restored the certificate for 2022 in the config map and all other places but it's still picking the older certificate (2021). they are currently reviewing all settings to see if there are any other places where we need to update the config.

  6. identified Jan 28, 2022, 11:23 PM UTC

    DevOps team currently comparing few yml files to make sure the certificates are properly installed. Currently they are working on it.

  7. identified Jan 29, 2022, 02:56 AM UTC

    Our Dev ops team is currently working with nginx-tugboatand. The Nginx-tugboat is trying to connect to the consul-master.

  8. identified Jan 29, 2022, 08:22 AM UTC

    Dev team is investigating this issue and this is due to nginx container name resolution is not happening in the service (consul-master-cluster-ip). The team is suspecting that this is due to incorrect Deployment configurations and that is what is being investigated right now.

  9. resolved Jan 29, 2022, 10:39 AM UTC

    DevOps has confirmed that the issue has been resolved, the environment gcp.qubole.com is up and running now.