Higher Logic incident

CSS Changes and Server Errors being reported

Minor Resolved View vendor source →

Higher Logic experienced a minor incident on January 23, 2024 affecting Community, lasting 32m. The incident has been resolved; the full update timeline is below.

Started
Jan 23, 2024, 06:07 PM UTC
Resolved
Jan 23, 2024, 06:40 PM UTC
Duration
32m
Detected by Pingoru
Jan 23, 2024, 06:07 PM UTC

Affected components

Community

Update timeline

  1. investigating Jan 23, 2024, 06:07 PM UTC

    We are currently investigating this issue.

  2. resolved Jan 23, 2024, 06:40 PM UTC

    This incident has been resolved.

  3. postmortem Jan 26, 2024, 07:28 PM UTC

    **Date: January 23, 2024** ‌ **What Happened** * Higher Logic Online Community \(OC\) site-specific theming CSS was unavailable for approximately 3 hours; this impacted those Online Community customers which used this feature. ‌ **Timeline - all times in EST on January 23** * 10:22 AM: Weekly OC code deployment for production tenants starts * 01:04 PM: Customer Experience/Support teams relay reports of error responses to page requests for generated CSS to technical staff and management * 01:10 PM: Decision made to roll back the code release * 01:12 PM: Roll back begins * 01:39 PM: Roll back completed * 01:55 PM: Validation of rollback completed; error responses have stopped. ‌ **Root Cause** * A broad code update was deployed which did not properly address the delivery of site-specific theming CSS. ‌ **Details** * Site-specific theming CSS is a feature which can be enabled on a per customer basis; not all customers make use of this feature. This feature was deployed to a subset of customers on Monday, January 22nd and no reports were received of problems. This was likely due to none of these customers having site-specific theming CSS enabled. * A contributing factor to this issue was that the method for delivering site-specific theming CSS in the development environment could be adjusted to use a method inconsistent with the production environment allowing the defect to evade detection with QA testing. ‌ **Corrective Actions** * The code which caused the issue was rolled back; resolving the issue. * New code was added to the impacted service to properly handle the code change and successfully deployed on January 24. * Implement application startup and health logging and create health check endpoints for the impacted service to detect and log issues during startup. * Change how the service is used within the development environment to allow identification of issues prior to production deployment.