Courier Outage History

Courier is up right now

Courier had 31 outages in the last 2 years totaling 78h 32m of downtime — averaging 1.3 incidents per month.

There were 31 Courier outages since May 28, 2024 totaling 78h 32m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.courier.com

Minor July 9, 2024

Automation Logs Delayed

Detected by Pingoru
Jul 09, 2024, 07:18 PM UTC
Resolved
Jul 12, 2024, 10:25 PM UTC
Duration
3d 3h
Affected: Web ApplicationAutomations
Timeline · 2 updates
  1. investigating Jul 09, 2024, 07:18 PM UTC

    Courier Automation logs are currently experiencing a delay in showing up. The team is aware of this issue and is mitigating the root cause by relieving the bottleneck of incoming Automation requests.

  2. resolved Jul 12, 2024, 10:25 PM UTC

    This was a symptom of a previous automation issue, where a snowball effect caused a massive backlog of events. It took a long time for the Kinesis stream to catch up. We resolved the issue by temporarily increasing the stream shard count. The issue is now fully resolved.

Read the full incident report →

Major July 9, 2024

Rendering Errors for Email Templates

Detected by Pingoru
Jul 09, 2024, 03:00 AM UTC
Resolved
Jul 09, 2024, 03:00 AM UTC
Duration
Timeline · 1 update
  1. resolved Jul 09, 2024, 05:30 AM UTC

    When URLs were present with click tracking, the team identified an issue with templates failing on the render step in the message lifecycle. The team has since identified the issue and rolled out a fix that addresses this rendering error and templates should be rendering properly.

Read the full incident report →

Minor June 27, 2024

Automation Delays

Detected by Pingoru
Jun 27, 2024, 11:24 PM UTC
Resolved
Jun 28, 2024, 12:40 AM UTC
Duration
1h 15m
Affected: Automations
Timeline · 2 updates
  1. investigating Jun 27, 2024, 11:24 PM UTC

    Automation delays are occurring due to a combination of data retrieval issues and system timeouts. The team is investigating .

  2. resolved Jun 28, 2024, 12:40 AM UTC

    This incident has been resolved, automation throughput has returned to normal levels.

Read the full incident report →

Major June 26, 2024

AWS serviceUnavailable Outage

Detected by Pingoru
Jun 26, 2024, 01:52 AM UTC
Resolved
Jun 26, 2024, 03:12 AM UTC
Duration
1h 20m
Affected: API
Timeline · 5 updates
  1. monitoring Jun 26, 2024, 01:49 AM UTC

    An incident where Courier messages resulted in "Internal Courier Error" was the result of AWS returning serviceUnavailable. The team has identified the issue and messages that responded with 5xx errors will be retried by our pipeline resilience.

  2. monitoring Jun 26, 2024, 01:52 AM UTC

    An incident where Courier messages resulted in "Internal Courier Error" was the result of AWS returning serviceUnavailable. The team has identified the issue and messages that responded with 5xx errors will be retried by our pipeline resilience.

  3. investigating Jun 26, 2024, 02:02 AM UTC

    Service has re-entered a degraded state. Sends from Courier are impacted.

  4. monitoring Jun 26, 2024, 02:29 AM UTC

    Service has resumed normal operation. Courier Engineers are monitoring.

  5. resolved Jun 26, 2024, 03:12 AM UTC

    The team has resolved the issue. Send pipeline operational.

Read the full incident report →

Minor June 11, 2024

Delayed Event Status

Detected by Pingoru
Jun 11, 2024, 03:36 PM UTC
Resolved
Jun 11, 2024, 04:26 PM UTC
Duration
49m
Affected: Web Application
Timeline · 4 updates
  1. investigating Jun 11, 2024, 03:36 PM UTC

    There is an increased number of events in Courier's events log table causing delayed queued events to display. The team has upped the write capacity for these events and is waiting for the stream to stabilize.

  2. identified Jun 11, 2024, 03:44 PM UTC

    The Courier team has identified the issue with the events table and has upped the capacity until events stabilize.

  3. monitoring Jun 11, 2024, 03:45 PM UTC

    Events are slowly stabilizing and the team is monitoring.

  4. resolved Jun 11, 2024, 04:26 PM UTC

    Events log stream has stabilized.

Read the full incident report →

Notice May 28, 2024

Delay in Message Status Updates

Detected by Pingoru
May 28, 2024, 04:34 PM UTC
Resolved
May 20, 2024, 07:30 PM UTC
Duration
Timeline · 1 update
  1. resolved May 28, 2024, 04:34 PM UTC

    On 2024-05-20 12:40 GMT-7, Courier experienced a sudden spike in outbound message volume. All messages were sent normally. However, the queue used to process message update events became overwhelmed and could not accept events at the rate they were produced. This caused a delay in message status updates as the queue backed up. Although the queue would have recovered eventually on its own, the engineering team chose to increase queue capacity to resolve the issue more quickly. This increase was implemented at 14:43, with full recovery of enqueued message update events by 14:50. Messages processed between 12:40 and 14:43 experienced a delay in status updates of up to 400 seconds, with a typical delay of about 100 seconds. There was no delay in message processing or delivery; all message update events were eventually processed. Outbound webhooks, which depend on the impacted queue, were similarly delayed, as were message statuses shown in the Logs UI and reported by the API.

Read the full incident report →