Brillium incident

502 Gateway Error Reports

Critical Resolved View vendor source →

Brillium experienced a critical incident on August 26, 2022 affecting API and User Administration and Authentication and 1 more component, lasting 12h 19m. The incident has been resolved; the full update timeline is below.

Started
Aug 26, 2022, 11:35 PM UTC
Resolved
Aug 27, 2022, 11:55 AM UTC
Duration
12h 19m
Detected by Pingoru
Aug 26, 2022, 11:35 PM UTC

Affected components

APIUser Administration and AuthenticationAssessment AuthoringPartner Central Custom AdministrationZapier Integration

Update timeline

  1. investigating Aug 26, 2022, 11:35 PM UTC

    Brillium is currently experiencing an issue and investigating the issue. We will post an update as soon as we learn more about the root cause of the issue.

  2. investigating Aug 27, 2022, 01:09 AM UTC

    Our Engineering Team is currently working with AWS Engineers to discover the root cause of the issue. We will share more information as we gain more insight. We do not have an estimated uptime as of now but will share once we understand the issue.

  3. investigating Aug 27, 2022, 05:41 AM UTC

    We have resolved the issue and slowly bring the platform back online. You may experience some initial slowness but the system will improve its performance as we began to become fully operational. We will post when the system is fully up and running optimally.

  4. monitoring Aug 27, 2022, 05:46 AM UTC

    The platform is back in service but we will continue to monitor and scale up for performance.

  5. monitoring Aug 27, 2022, 11:53 AM UTC

    All services are available, and we will continue to closely monitor systems.

  6. resolved Aug 27, 2022, 11:55 AM UTC

    We are satisfied that we have resolved the issue and all systems are operating within normal parameters. For more details, we will provide a post mortem on this issue describing what caused the issue and steps we took to remediate and prevent it from possibly occurring again.

  7. postmortem Aug 31, 2022, 09:27 PM UTC

    **Background:** Brillium customers experienced an incident from approximately Friday, August 26, 2022 @ 2145 UTC to Saturday, August 27, 2022 @ 0530 UTC. This issue affected customer access to both Brillium Assessment Builder version 10 and Brillium version 11 applications. **Root Cause:** After interfacing with several Amazon AWS engineers and conducting a thorough incident analysis and investigation internally, we have discovered an incompatibility between network configuration components that would present itself as a loss of connectivity even though the systems themselves were in fact nominal. **Steps Taken:** * We communicated actions taken during analysis and remediation phases of the incident via the Brillium Status Page. * We responded to customer inquiries through our support channel * We worked with Amazon support engineers to analyze and identify the root cause * We applied remediation steps to remediate and fully resolve the root cause. * We monitored the systems to ensure its stability. **Mitigation:** The components identified during the investigation are necessary for compatibility in order to support customers on earlier versions of the platform \(i.e. v9, v10\). During the remediation steps, we were able to remove any dependency on the components identified as the root cause. Additionally, we added validation enhancements to our QA process that include compatibility checks in order to prevent any such issue from reoccurring in the future.