Metabase incident

Issue with SSL certificates

Critical Resolved View vendor source →

Metabase experienced a critical incident on November 9, 2023 affecting Metabase Cloud Platform, lasting 24m. The incident has been resolved; the full update timeline is below.

Started
Nov 09, 2023, 07:37 PM UTC
Resolved
Nov 09, 2023, 08:02 PM UTC
Duration
24m
Detected by Pingoru
Nov 09, 2023, 07:37 PM UTC

Affected components

Metabase Cloud Platform

Update timeline

  1. investigating Nov 09, 2023, 07:37 PM UTC

    We're investigating an issue in our cloud with certificate management that's causing browsers to receive incorrect certificates

  2. resolved Nov 09, 2023, 08:02 PM UTC

    Ingress controller certificates are now back to normal. You should be able to get the correct certificate in the browser now. We're very sorry for the inconvenience, we'll publish a retrospective about the issue in the next few days.

  3. postmortem Nov 15, 2023, 01:15 PM UTC

    **Summary** On Thursday Nov 9th the default certificate for the external Nginx ingress was replaced with a dummy certificate. Users attempting to reach their hosted instances would get an invalid certificate error. This was due to a misconfiguration in the new CI for releasing our helm charts that used the internal Nginx ingress values files for the external Nginx ingress. **Impact** All hosted customers not using custom domains were affected. **How was the root cause diagnosed?** We identified that the values were properly set in the values files. We Investigated the CI steps to determine why the correct values were not set. It was determined that the wrong values files were being used for external Nginx ingress. **How we’ll make this not happen again?** * Update staging and production value files to match. * Add alert to pagerduty for non-custom domains \(catch alerts faster\)