Xink incident

Xink Admin Portal - Inaccessible

Notice Resolved View vendor source →

Xink experienced a notice incident on September 26, 2022 affecting Admin Portal and Admin Portal and 1 more component, lasting 9h 43m. The incident has been resolved; the full update timeline is below.

Started
Sep 26, 2022, 08:22 PM UTC
Resolved
Sep 27, 2022, 06:06 AM UTC
Duration
9h 43m
Detected by Pingoru
Sep 26, 2022, 08:22 PM UTC

Affected components

Admin PortalAdmin PortalAdmin PortalAdmin PortalAdmin Portal

Update timeline

  1. investigating Sep 26, 2022, 08:22 PM UTC

    We are experiencing some problems accessing the Xink portal. We are currently working on resolving the issue as our TOP PRIORITY.

  2. resolved Sep 27, 2022, 07:03 AM UTC

    Auth service failed with "HTTP Error 500.30 - ASP.NET Core app failed to start" error. Quick measures like restarting or redeployment didn't resolve the issue. Several more measures were applied and the Auth service got back to normal. We apologize for this issue and will provide additional information along investigation with RCA.

  3. postmortem Sep 27, 2022, 05:45 PM UTC

    # RCA \(Root Cause Analysis\) **Exception Occurred During App Startup** Exceptions detected in Event Logs and requests failed with 500.30 status code. It is very likely the cause of the problem was an [ASP.NET](http://ASP.NET) Core Logging Integration extension that Microsoft had upgraded on Mon, 26 Sep 2022 23:41:16 GMT. Microsoft now use RC version \(Release Candidate\) in production. This new version of the extension broke Xink Auth dependency tree and literally killed the production application. **Mitigation:** We switched deployment mode from Portable \(the app uses libraries and framework provided by hosting\) to Self-Contained \(we copy all the libraries during every deployment\). This makes us less dependent on hosting configuration. But it doesn't solve the problem as we still rely on system parts and extensions \(RC-version, see above\). **Next Steps :** We consider two paths to follow to eliminate such issue in the future. First - resuming Kubernetes migration, moving all parts from Azure native services to our clusters. Most likely candidate. Second - move to stand-alone containers. Quick solution.