Sympa HR incident

Approvals Functionality Issue

Notice Resolved View vendor source →

Sympa HR experienced a notice incident on December 13, 2024 affecting System availability, lasting 9h 13m. The incident has been resolved; the full update timeline is below.

Started
Dec 13, 2024, 10:00 AM UTC
Resolved
Dec 13, 2024, 07:14 PM UTC
Duration
9h 13m
Detected by Pingoru
Dec 13, 2024, 10:00 AM UTC

Affected components

System availability

Update timeline

  1. investigating Dec 13, 2024, 10:00 AM UTC

    We are currently investigating an issue impacting the Approvals functionality. Our team is actively working to identify the cause and resolve the issue as quickly as possible. We will share updates as we have more information. Thank you for your patience and understanding.

  2. identified Dec 13, 2024, 10:56 AM UTC

    We have identified root cause for the issue and are working on a hotfix. We will share updates as we have more information. Thank you for your patience and understanding.

  3. monitoring Dec 13, 2024, 01:42 PM UTC

    A fix has been implemented and we are monitoring the situation.

  4. resolved Dec 13, 2024, 07:14 PM UTC

    The issue impacting the Approvals functionality has been fully resolved. Our team has implemented a fix, and monitoring confirms that the functionality is working as expected. We appreciate your patience and understanding during this time. If you continue to experience any issues, please reach out to our support team.

  5. postmortem Dec 16, 2024, 07:17 PM UTC

    **Post-Mortem Report: Production Deployment Issue on 12.12** **Incident Overview:** On 12.12 at 21:00, a new software package was deployed to production. During this deployment, two issues were introduced, both related to approval message functionality. These issues impacted the timely construction and delivery of approval-related emails. **Details of the Issues:** **1. Missing Database Access Rights** * **Impact**: This issue caused the failure of all approval message constructions. * **Timeline**: The problem persisted between 12.12 at 21:50 and 13.12 at 9:30. * **Mitigation**: The issue was resolved by granting the necessary database access rights at 13.12 at 9:30. * **Effect**: No approval emails were sent during the affected period. **2. Undefined Sympa Username for Related Persons** * **Impact**: Approval messages failed to be sent in cases where one or more related individuals did not have a Sympa username defined. This affected all recipients involved in the corresponding approval. * **Timeline**: This issue occurred from 13.12 at 9:30 to 13.12 at 15:30. * **Mitigation**: The underlying issue was identified and corrected by 13.12 at 15:30. * **Effect**: Emails associated with these specific approvals were not sent. **Scope of Impact** The issue impacted approval messages across multiple customers, resulting in a significant number of unprocessed emails. In total, the deployment failure affected approximately 450 approval messages across 60 customers, where most of customer only few messages concerned. These numbers illustrate the widespread nature of the problem during the incident. **Technical Constraints** Due to the nature of the issue, which disrupted the construction of approval messages, it is unfortunately not possible to reconstruct and resend these emails automatically. This limitation stems from the absence of retained data necessary for reconstructing the messages in their original context. **Next Step** Sympa can provide a detailed list of the affected approvals upon request. This list can assist in manually addressing any critical messages that were not delivered. **Preventative Measures** To prevent similar issues in the future, the following actions will be taken: * Implementing more rigorous pre-deployment testing, specifically targeting approval-related functionality. * Establishing automated monitoring for database access configurations to detect and rectify issues prior to deployment. * Enhancing error logging and data retention mechanisms to allow reconstruction of critical messages if failures occur. We apologize for the inconvenience caused and remain committed to improving the reliability of our systems.