Metabase experienced a critical incident on November 9, 2023 affecting Metabase Cloud Platform, lasting 24m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Nov 09, 2023, 07:37 PM UTC
We're investigating an issue in our cloud with certificate management that's causing browsers to receive incorrect certificates
- resolved Nov 09, 2023, 08:02 PM UTC
Ingress controller certificates are now back to normal. You should be able to get the correct certificate in the browser now. We're very sorry for the inconvenience, we'll publish a retrospective about the issue in the next few days.
- postmortem Nov 15, 2023, 01:15 PM UTC
**Summary** On Thursday Nov 9th the default certificate for the external Nginx ingress was replaced with a dummy certificate. Users attempting to reach their hosted instances would get an invalid certificate error. This was due to a misconfiguration in the new CI for releasing our helm charts that used the internal Nginx ingress values files for the external Nginx ingress. **Impact** All hosted customers not using custom domains were affected. **How was the root cause diagnosed?** We identified that the values were properly set in the values files. We Investigated the CI steps to determine why the correct values were not set. It was determined that the wrong values files were being used for external Nginx ingress. **How we’ll make this not happen again?** * Update staging and production value files to match. * Add alert to pagerduty for non-custom domains \(catch alerts faster\)