Factorial HR incident

Major outage

Major Resolved View vendor source →

Factorial HR experienced a major incident on September 9, 2020 affecting API & backend and Factorial website, lasting —. The incident has been resolved; the full update timeline is below.

Started
Sep 09, 2020, 02:15 PM UTC
Resolved
Sep 09, 2020, 02:15 PM UTC
Duration
Detected by Pingoru
Sep 09, 2020, 02:15 PM UTC

Affected components

API & backendFactorial website

Update timeline

  1. resolved Sep 09, 2020, 02:15 PM UTC

    Majour outage of all our services except the blog from 14:39 to 15:43.

  2. postmortem Sep 09, 2020, 02:16 PM UTC

    # What happened? At 14:39 new content for our public pages was deployed, causing our cache to hit its maximum capacity limit. This event triggered a fallback strategy: we start requesting a third-party service to serves us the content for our public pages. This third-party service quickly became overwhelmed with requests and started applying an exponential backoff strategy, forcing our backend services to wait long periods of time in order to get a response, and thus making our API unresponsive. # How did we solve it? Increasing the maximum capacity limit of our cache fixed the issue. # How are we gonna make sure it does not happen again? We are gonna review our cache strategy, so that our whole infrastructure does not depend on it in order to properly function.