Populi incident

Email problems

Notice Resolved View vendor source →

Populi experienced a notice incident on April 21, 2020, lasting 3h 5m. The incident has been resolved; the full update timeline is below.

Started
Apr 21, 2020, 03:40 PM UTC
Resolved
Apr 21, 2020, 06:45 PM UTC
Duration
3h 5m
Detected by Pingoru
Apr 21, 2020, 03:40 PM UTC

Update timeline

  1. investigating Apr 21, 2020, 03:40 PM UTC

    Due to increased usage of Populi's online learning features right now, we hit a limit with our email provider that led to messages being marked as "rejected" so we retried sending them.... however, when they raised our limit the retires are flooding in and sending multiple copies of each message... sorry for the hassle! We're working with them to halt the flow of repeat email as soon as possible....

  2. identified Apr 21, 2020, 03:41 PM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Apr 21, 2020, 04:26 PM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved Apr 21, 2020, 06:45 PM UTC

    This incident has been resolved... there are a handful of backed up emails we'll be sending throughout the day, but all new email sent during the last several hours looks good.

  5. postmortem Apr 21, 2020, 06:51 PM UTC

    Sorry about that, everyone! We’re in the middle of upgrading our email sending providers and adding DMARC support, and unfortunately the drastic usage spike we’ve been experiencing lately caused us to unexpectedly run up against a daily email sending cap with our new email provider \(Amazon\). That’s why emails were suddenly being delayed, and when we managed to work with Amazon to get our usage limit raised, a bug in our retry sending code \(that’s supposed to retry sending email in case a provider goes down, sporadic network failure, etc\) resulted in duplicate emails being sent to many users. This bug was a tricky one because it only manifested when we bumped up against the usage cap AND had production levels of email flowing through the system - and unfortunately that means we didn’t catch it during testing. We’ve temporarily fallen back to sending through our old email provider, and after we purge duplicates from the new queue and get any messages that should be sent flushed out, we’ll confirm the retry logic bug is fixed and carefully begin transitioning to Amazon again, after working with them to raise our sending limit yet again to give us lots of headroom this time. Thanks for your patience as we continue to deal with abnormally high usage and still try to push the product forward at the same time!