injixo incident

Plan - Shift Center closes unexpectedly and does not (re)start

Critical Resolved View vendor source →

injixo experienced a critical incident on May 3, 2023, lasting 2h 36m. The incident has been resolved; the full update timeline is below.

Started
May 03, 2023, 01:48 PM UTC
Resolved
May 03, 2023, 04:25 PM UTC
Duration
2h 36m
Detected by Pingoru
May 03, 2023, 01:48 PM UTC

Update timeline

  1. investigating May 03, 2023, 01:48 PM UTC

    We are currently investigating this issue.

  2. investigating May 03, 2023, 02:41 PM UTC

    We are continuing to investigate this issue.

  3. identified May 03, 2023, 03:15 PM UTC

    The issue has been identified and a fix is being implemented.

  4. monitoring May 03, 2023, 03:30 PM UTC

    A fix has been implemented and we are monitoring the results.

  5. resolved May 03, 2023, 04:25 PM UTC

    This incident has been resolved.

  6. postmortem May 05, 2023, 02:02 PM UTC

    **Summary** On Wendnesday 03.05.2023 around 15:45 CET we noticed that customers were unable to work with Plan/ShiftCenter. 16:41 we managed to rollback our changes. Unexpectedly, however, this did not solve the problem and further investigations became necessary. 17:15 we involved our platform engineering team to downgrade the machine to a previous version. The machine rollback resolved the issue instantly. **Fault** Any attempt to open a ShiftCenter leads to the error message about application load failed. **Impact** Starting 03.05.2023 15:45 CET until 03.05.2023 18:25 CET all customers using Plan/ShiftCenter were unable to use it. **Detection** The incident was detected by our development team first. **Response** We responded to customers on 03.05.2023 15:45 CET about a partial outage of Plan/ShiftCenter at status.injixo.com and raise the outage to major outage at 03.05.2023 16:41 CET. **Recovery** We rolled back all code changes made by our team first. As this did not resolved the outage we rolled back all changes made on the machine. **Root cause** We accidentally deployed an insufficiant tested version with security updates that blocked starting the application. **Corrective actions** We developed peer review approval process to ensure, that accidentally deployments will not happend again. We apologise for the service disruption.