Skylight incident

SSL Certificate Issue

Minor Resolved View vendor source →

Skylight experienced a minor incident on January 2, 2022 affecting Application and Hosting, lasting 16m. The incident has been resolved; the full update timeline is below.

Started
Jan 02, 2022, 05:40 PM UTC
Resolved
Jan 02, 2022, 05:57 PM UTC
Duration
16m
Detected by Pingoru
Jan 02, 2022, 05:40 PM UTC

Affected components

ApplicationHosting

Update timeline

  1. identified Jan 02, 2022, 05:37 PM UTC

    The Skylight dashboard is inaccessible currently due to a configuration issue. This outage also impacted agent authentication – new authentications from agents will not succeed at the moment. Agents that are already authenticated can continue to report data until the authentication session expires. The data processing pipeline is technically unaffected by this outage as it is hosted on a different provider. However, given that agents are failing to authenticate (and therefore failing to submit traces), we expect this to cause lapses in Skylight data during the outage period.

  2. monitoring Jan 02, 2022, 05:40 PM UTC

    We have manually uploaded a new cert, have completed the migration and are monitoring the situation. The Skylight dashboard (skylight.io) should be immediately accessible. If you are still unable to access the site, please email [email protected] for assistance. Your Skylight agents should resume reporting data once it retries the previously failed authentication request. If this does not occur, you can try restarting your app, which would force the agent to restart and authenticate again. If that still doesn't work, please email [email protected] for further help. We are very sorry for the trouble.

  3. resolved Jan 02, 2022, 05:57 PM UTC

    Our metrics indicate that the agent report rate has recovered to the level before the incident. We believe most customer agents have resumed normal reporting and the issue has been resolved. If you continue to encounter issues, please email [email protected] for assistance. Unfortunately, if your agent was "locked out" from an expired authentication and was unable to report data during the outage, that unreported data will not be available to view on the dashboard. We are truly sorry about this.