Brillium incident

Incident Investigation

Minor Resolved View vendor source →

Brillium experienced a minor incident on February 15, 2023 affecting Assessment Authoring, lasting 34d 20h. The incident has been resolved; the full update timeline is below.

Started
Feb 15, 2023, 10:04 PM UTC
Resolved
Mar 22, 2023, 06:09 PM UTC
Duration
34d 20h
Detected by Pingoru
Feb 15, 2023, 10:04 PM UTC

Affected components

Assessment Authoring

Update timeline

  1. investigating Feb 15, 2023, 10:04 PM UTC

    Brillium is currently experiencing a occasional issue during Respondents taking an Assessment. Respondents may receive the following message when this issue occurs: General System Error. We are aggressively investigating to understand the issue so we can deliver a solution to resolve. As we understand more on the issue, we will update the status page to keep you informed on the issue, our solution and a proposed timeframe to resolve.

  2. investigating Feb 15, 2023, 10:07 PM UTC

    Brillium Engineering Team has eliminated any issue with the platform. We are now working with AWS engineers to determine the issue. This issue has been escalated at AWS.

  3. investigating Feb 20, 2023, 12:45 PM UTC

    We have determined that the issue we are experiencing is related to the database. Specifically, the Brillium platform has encountered very infrequent intermittent challenges with connecting to the database causing the General System Error Respondents experiencing. We are currently working closely with AWS to identify the root cause of these issues. Please know that we are actively investigating the matter, and we will provide updates as soon as we have additional information.

  4. identified Mar 21, 2023, 08:44 PM UTC

    Following a comprehensive investigation and analysis, we have identified the root cause of the issue. Our findings indicate that the issue lies in the interaction between the application and the database, which is supported by the data collected during the investigation. We have already implemented some temporary enhancements aimed at working around the root cause. Meanwhile, our engineering team will begin introducing a series of permanent solutions over the next several weeks. The system is not expected to experience any downtime or service interruptions during this time.

  5. monitoring Mar 22, 2023, 06:08 PM UTC

    An update to address the "General Error" message has been applied. We have been monitoring the system for the last 12 hours with no further indications of this issue.

  6. resolved Mar 22, 2023, 06:09 PM UTC

    Post monitoring confirms this issue has been resolved