Factorial HR incident

Performance degraded on our API request time

Critical Resolved View vendor source →

Factorial HR experienced a critical incident on May 18, 2020 affecting API & backend, lasting 6h 43m. The incident has been resolved; the full update timeline is below.

Started
May 18, 2020, 07:30 AM UTC
Resolved
May 18, 2020, 02:14 PM UTC
Duration
6h 43m
Detected by Pingoru
May 18, 2020, 07:30 AM UTC

Affected components

API & backend

Update timeline

  1. investigating May 18, 2020, 09:30 AM UTC

    We are currently investigating the issue

  2. investigating May 18, 2020, 12:20 PM UTC

    We found the culprit of this issue. Yesterday we deployed a change with our custom fields system with a non performant endpoint, this degraded our puma's trying to serve this requests for about 60 seconds. This was causing other requests to be delayed and eventually timedout. We partially disabled custom fields feature in order to keep other parts of the app working. We're fixing the performance regression and we'll enable full custom fields feature once we get a decent performance on affected endpoint. We'll keep updating this incidence with further steps.

  3. identified May 18, 2020, 12:33 PM UTC

    The issue has been identified and a fix is being implemented.

  4. resolved May 18, 2020, 02:14 PM UTC

    We fixed the main performance issue and now the system is stable. The custom fields feature has been enabled again. We're still going to monitor our system to detect possible regressions.