Benevity incident

Intermittent Availability of Spark Site Dashboards

Minor Resolved View vendor source →

Benevity experienced a minor incident on March 28, 2025 affecting Donate and Volunteer Core Services, lasting 17m. The incident has been resolved; the full update timeline is below.

Started
Mar 28, 2025, 10:01 PM UTC
Resolved
Mar 28, 2025, 10:18 PM UTC
Duration
17m
Detected by Pingoru
Mar 28, 2025, 10:01 PM UTC

Affected components

Donate and Volunteer Core Services

Update timeline

  1. identified Mar 28, 2025, 10:01 PM UTC

    We have identified an issue causing an error when attempting to load the Dashboard of Spark Client sites. Our teams are working on a fix for this issue. At this time, only the Dashboard is affected - Donate, Search, and Volunteering functionality are still fully operational.

  2. monitoring Mar 28, 2025, 10:14 PM UTC

    We have deployed a fix for the issue - Dashboard functionality has been restored to all Spark client sites. At this time all Spark sites and workflows are fully operational. Benevity's teams are monitoring for any further issues or recurrences.

  3. resolved Mar 28, 2025, 10:18 PM UTC

    This incident has been resolved.

  4. postmortem May 05, 2025, 01:35 AM UTC

    ### Summary On March 28, the Dashboard page on all Spark Sites failed to load due to a misconfigured release. While all other site functionality remained unaffected, users encountered an error when accessing the Dashboard. The issue was caused by an incorrect version of the Dashboard page being referenced during the release process. The version specified did not exist in production, preventing the page from rendering. The incident was identified and addressed quickly, with a corrected build deployed to production within approximately 15 minutes. The incident was fully resolved within 20 minutes of detection. To prevent recurrence, we are enhancing our build pipeline to eliminate manual versioning errors, improving the release process to support gradual rollouts, and introducing both automated and manual verification steps for future releases. ### Impact * The dashboard on all Spark Sites failed to load, displaying an error. * All other site functionality remained fully operational. ### Root Cause A release of the new Dashboard page was manually incorrectly configured to reference a version unavailable in production. ### Future Mitigation * Improve build pipeline to eliminate manual versioning errors. * Implement gradual rollout processes for upgraded pages. * Add additional verification for pull requests triggering production releases. * Require manual verification of releases. * Deploy automated alerting for failed rendering of upgraded pages in production. ### Timeline of Events * 2025/03/28 - 15:48MT: Identification of issue in production * 2025/03/28 - 15:50MT: Incident triggered * 2025/03/28 - 15:50MT: New build triggered * 2025/03/28 - 16:05MT: New build released to production * 2025/03/28 - 16:10MT: Incident resolved in production