pool.ntp.org incident

Website and monitoring down

Critical Resolved View vendor source →

pool.ntp.org experienced a critical incident on August 19, 2018 affecting Management Portal and Public website and 1 more component, lasting 2h 13m. The incident has been resolved; the full update timeline is below.

Started
Aug 19, 2018, 07:45 AM UTC
Resolved
Aug 19, 2018, 09:58 AM UTC
Duration
2h 13m
Detected by Pingoru
Aug 19, 2018, 07:45 AM UTC

Affected components

Management PortalPublic websiteDNS updates

Update timeline

  1. identified Aug 19, 2018, 07:45 AM UTC

    The website and monitoring system is down. Our kubernetes cluster had an internal certificate that expired today. The Tectonic install that we were using used an internal certificate for the API server that was only valid a year (and without an automated process to update it, ugh). We set back the time on the cluster to get access again and are going through the manual steps listed on https://coreos.com/tectonic/docs/latest/tls/rotate-tls.html -- ugh! The NTP service and DNS service is operating normally.

  2. monitoring Aug 19, 2018, 09:58 AM UTC

    The web system has been up since about an hour ago. The monitoring system was just turned back on now.

  3. resolved Aug 19, 2018, 09:58 AM UTC

    This incident has been resolved.