Benevity experienced a critical incident on December 19, 2024 affecting Donate and Volunteer Core Services, lasting 34m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- identified Dec 19, 2024, 12:46 AM UTC
We are experiencing an issue with Spark landing pages not loading correctly. We have identified the cause and are actively working to remediate the issue.
- monitoring Dec 19, 2024, 01:06 AM UTC
The issue affecting Spark has been resolved, and we are monitoring the issue to ensure stability. If you continue to experience issues, please submit a support request. Thank you for your understanding.
- resolved Dec 19, 2024, 01:20 AM UTC
The issue affecting Spark has been fully resolved, and our team has taken all necessary steps to prevent this from happening again in the future. We apologize for any inconvenience this may have caused.
- postmortem Dec 23, 2024, 10:39 PM UTC
### Summary On December 18, 2024, between 17:16 AM MT and 17:55 AM MT, Spark client sites displayed a blank web page, rather than the expected Spark functionality, to users on both web and mobile preventing them from logging into and using their respective client sites. Investigation determined the issue was the result of malformed application code, exposed during a system configuration for the enablement of new Spark functionality. Once identified, the configuration change was rolled back and service was restored to all Spark client sites. ### Impact Users attempting to access their Spark client site, via either web or mobile app, would have been unable to do so over the duration of the incident, and would have instead been presented with a blank web page preventing the use of all Spark functionality, including login. ### Root Cause Investigation determined that the combination of the enablement of new Spark functionality, along with a latent bug in that functionality resulted in users experiencing a blank webpage rather than the expected content/functionality when attempting to access their Spark client site. Although the new functionality itself followed Benevity’s Software Development Life Cycle \(SDLC\) during it’s creation, including review and testing, it was released and existed in Production in a disabled / not enabled state for several months. Once this functionality was enabled by a system configuration change, the effect of the issue was exposed. Benevity’s DevOps and Engineering Teams took steps to revert the configuration and fix the malformed code. ### Future Mitigation This incident identified a gap in some areas of Benevity’s feature enablement process - although the new functionality followed Benevity’s SDLC, the underlaying issue was not detected by automated testing. Additionally, the release and enablement of this functionality was separated by several months, adding a layer of complexity during initial triage and delaying the identification of the issue. The process for feature enablement through this type of system configuration change, although requiring review and approval, did not provide sufficient testing prior to application in Production. The incident response team has highlighted changes to improve these processes and are working to implement these into our broader change management processes. ### Timeline of Events * Wed 18 Dec 2024 17:18 MST: Incident Detected * Wed 18 Dec 2024 17:36 MST: Key issue identified * Wed 18 Dec 2024 17:42 MST: Rollback of issue conducted & configuration refresh started * Wed 18 Dec 2024 17:51 MST: Configuration refresh completed * Wed 18 Dec 2024 17:52 MST: Spark sites return to nominal * Wed 18 Dec 2024 17:52 MST: Incident resolved