pool.ntp.org incident

Website and monitoring down

pool.ntp.org experienced a critical incident on August 19, 2018 affecting Management Portal and Public website and 1 more component, lasting 2h 13m. The incident has been resolved; the full update timeline is below.

Started: Aug 19, 2018, 07:45 AM UTC
Resolved: Aug 19, 2018, 09:58 AM UTC
Duration: 2h 13m
Detected by Pingoru: Aug 19, 2018, 07:45 AM UTC

Affected components

Management PortalPublic websiteDNS updates

Update timeline

identified Aug 19, 2018, 07:45 AM UTC

The website and monitoring system is down. Our kubernetes cluster had an internal certificate that expired today. The Tectonic install that we were using used an internal certificate for the API server that was only valid a year (and without an automated process to update it, ugh). We set back the time on the cluster to get access again and are going through the manual steps listed on https://coreos.com/tectonic/docs/latest/tls/rotate-tls.html -- ugh! The NTP service and DNS service is operating normally.
monitoring Aug 19, 2018, 09:58 AM UTC

The web system has been up since about an hour ago. The monitoring system was just turned back on now.
resolved Aug 19, 2018, 09:58 AM UTC

This incident has been resolved.