Falcony incident

Degraded performance

Minor Resolved View vendor source →

Falcony experienced a minor incident on March 17, 2025 affecting Falcony, lasting 10h 50m. The incident has been resolved; the full update timeline is below.

Started
Mar 17, 2025, 10:19 AM UTC
Resolved
Mar 17, 2025, 09:10 PM UTC
Duration
10h 50m
Detected by Pingoru
Mar 17, 2025, 10:19 AM UTC

Affected components

Falcony

Update timeline

  1. investigating Mar 17, 2025, 10:19 AM UTC

    Our downstream provider, Mailgun, is experiencing issues that are affecting Falcony. As a result, Falcony is unable to send emails, and some data in the service may appear outdated. For more details, please visit the Mailgun status page: https://status.mailgun.com/

  2. identified Mar 17, 2025, 10:20 AM UTC

    Our downstream provider is implementing a fix.

  3. monitoring Mar 17, 2025, 01:14 PM UTC

    Our downstream provider has implemented a fix. We are monitoring the results.

  4. monitoring Mar 17, 2025, 03:05 PM UTC

    We are no longer experiencing issues on our systems due to the downstream provider. However, the downstream provider has not yet marked the issue as resolved. We will continue to monitor the situation.

  5. resolved Mar 17, 2025, 09:10 PM UTC

    The downstream provider has confirmed that the issue has been solved.

  6. postmortem Mar 19, 2025, 08:53 AM UTC

    **Incident summary** On Monday, March 17th 2025, our email provider Mailgun experienced [an outage](https://status.mailgun.com/incidents/b60dthm3t8gk) that prevented emails from being sent out. This resulted in a backlog of emails in our task queue. The task queue being full in turn impacted other services that rely on the same system, including PDF generation, integrations, malware scanning, and user sessions. **Impact** * Emails were not sent as expected. Some were significantly delayed, while others failed to send altogether. * The task queue filled up, affecting other system functionalities beyond email delivery. * Users experienced disruptions in services utilising task queue, including document generation and integrations. **Resolution and mitigation** * To minimize further disruptions, we increased the memory allocated to the task queue to ensure continued processing. * Once the email provider resolved the underlying issue, we added additional resources to speed up the processing of the task queue. * On Tuesday and Wednesday, we manually sent out any remaining emails that had not been delivered due to earlier failures. **Next steps** We have identified areas in our system where we can improve resilience in case of similar provider outages in the future. Our key focus areas will be: * Ensuring that undelivered emails remain in the queue and can be retried once the service is restored * Implementing better isolation between the email task queue and other dependent services to prevent cascading failures. * Enhancing monitoring and alerting mechanisms to respond more quickly to issues related to email delivery. We appreciate your patience and understanding during this incident. Our team is committed to making improvements to prevent similar disruptions in the future. If you have any questions or concerns, please reach out to our support team at [[email protected]](mailto:[email protected]). **Falcony team**