Brillium incident

502 Bad Gateway / 504 gateway Timeout Error

Major Resolved View vendor source →

Brillium experienced a major incident on May 19, 2022 affecting API and User Administration and Authentication and 1 more component, lasting 12h 21m. The incident has been resolved; the full update timeline is below.

Started
May 19, 2022, 02:19 PM UTC
Resolved
May 20, 2022, 02:41 AM UTC
Duration
12h 21m
Detected by Pingoru
May 19, 2022, 02:19 PM UTC

Affected components

APIUser Administration and AuthenticationAssessment AuthoringPartner Central Custom Administration

Update timeline

  1. investigating May 19, 2022, 02:19 PM UTC

    We have received reports from some customers that they are unable to access Brillium with an Error 502 Bad Gateway/504 Gateway Timeout. We are in the process of investigating the issue and will report further progress.

  2. investigating May 19, 2022, 02:36 PM UTC

    Administration, Partner Central, and Profile applications are unaffected and remain operational. Investigation is focused on Application Builder and we will continue to report further progress.

  3. investigating May 19, 2022, 02:56 PM UTC

    Brillium Application Builder access has been restored. Investigation to root cause continues. We will post additional findings soon.

  4. investigating May 19, 2022, 03:58 PM UTC

    We are continuing to monitor systems while conducting the root cause analysis.

  5. identified May 19, 2022, 04:20 PM UTC

    The root cause of the issue has been identified, and the development team is applying a emergency hotfix to fully resolve the issue. There may be very brief interruption in service for some customers while the patch is being applied. We expect this process to take no more than a few minutes. No further interruption to services is planned or expected.

  6. monitoring May 19, 2022, 04:51 PM UTC

    The application of the hotfix has been completed. The systems will be closely monitored over the next 24 hours.

  7. resolved May 20, 2022, 02:41 AM UTC

    Monitoring, testing, and analysis show that systems are operating as expected. This issue is resolved. A post-mortem report will be posted when complete.