Factorial HR incident

DNS change produced downtime

Critical Resolved View vendor source →

Factorial HR experienced a critical incident on May 12, 2020 affecting API & backend and Factorial website, lasting —. The incident has been resolved; the full update timeline is below.

Started
May 12, 2020, 01:28 PM UTC
Resolved
May 12, 2020, 01:28 PM UTC
Duration
Detected by Pingoru
May 12, 2020, 01:28 PM UTC

Affected components

API & backendFactorial website

Update timeline

  1. resolved May 12, 2020, 01:28 PM UTC

    This incident has been resolved.

  2. postmortem May 12, 2020, 01:28 PM UTC

    # What happened? Today we made a change in our DNS \(Domain Name System\) that produced a downtime by making our main domains \([factorialhr.com](http://factorialhr.com), [factorialhr.es](http://factorialhr.es), [factorialhr.fr](http://factorialhr.fr), ...\) unable to be resolved for a few seconds. Our infrastructure change first destroyed and then recreated an existing dns record, while having our SOA retry-time too high. That produced a downtime to our public sites of about 7 minutes # How are we going to prevent similar issues in the future? Re-think the way we apply DNS changes in our infrastructure. We will lower the retry-time in our SOAs to more manageable values. We'll also apply some changes manually first and then pass these changes to our infrastructure code.