Clojars incident

Clojars unavailable

Notice Resolved View vendor source →

Clojars experienced a notice incident on August 5, 2016, lasting 23m. The incident has been resolved; the full update timeline is below.

Started
Aug 05, 2016, 09:42 AM UTC
Resolved
Aug 05, 2016, 10:06 AM UTC
Duration
23m
Detected by Pingoru
Aug 05, 2016, 09:42 AM UTC

Update timeline

  1. investigating Aug 05, 2016, 09:42 AM UTC

    Working on restoring access after nginx upgrade.

  2. monitoring Aug 05, 2016, 10:05 AM UTC

    Nginx problems appear to have been resolved (IPv6 issues)

  3. resolved Aug 05, 2016, 10:06 AM UTC

    This incident has been resolved.

  4. postmortem Aug 03, 2018, 05:33 PM UTC

    ## Summary There was approximately 25 minutes of server downtime from 2016-08-05 09:46 UTC to 2016-08-05 10:09 UTC while updating nginx. The root cause was a change in nginx default behaviour around IPv6 sockets which caused IPv4 connections to be ignored. ## Details The version of nginx installed on the server was quite old. Before upgrading, I looked through the release notes to look for changes. I copied them to a text file for easy reference if something went wrong. Nothing in the changelog looked like it would affect us (though crucially, I didn't fully understand all the mentioned changes...). I updated nginx while it was running (to prevent downtime being too long). Once the update had finished, I restarted nginx. It reloaded and everything seemed to be ok. I then realised that the old process was running, and the new version hadn't started. After killing the old process and starting the new one, clojars.org wasn't accessible. I searched for possible answers, and a change that was mentioned in the changelog came up: `ipv6only` was now set on by default for IPv6 sockets. This means that only IPv6 connections would be accepted and IPv4 would not connect. If you were connecting to Clojars over IPv6, you would have had no connectivity issues. I updated the nginx server config to turn off `ipv6only`, and service was restored. ## Learnings * When updating software, make sure that we understand all changes in the changelog, especially ones which change default behaviour. * If possible it would be good to test upgrades on a test server before applying it to the production server.