Retreaver incident

Amazon S3 is down

Minor Resolved View vendor source →

Retreaver experienced a minor incident on February 28, 2017, lasting 5h 3m. The incident has been resolved; the full update timeline is below.

Started
Feb 28, 2017, 06:17 PM UTC
Resolved
Feb 28, 2017, 11:20 PM UTC
Duration
5h 3m
Detected by Pingoru
Feb 28, 2017, 06:17 PM UTC

Update timeline

  1. monitoring Feb 28, 2017, 06:17 PM UTC

    We've brought additional capacity online to deal with the high volume of traffic, asynchronous webhooks/pixel fires will be caught up shortly. Synchronous fires are not experiencing an issue and all other systems continue to operate normally.

  2. monitoring Feb 28, 2017, 06:24 PM UTC

    Webhooks/pixel fires are currently delayed due to an outage at Amazon S3. Our HAR files, which log web traffic between our server and the webhook servers, are stored on S3. We're actively monitoring the situation and will update shortly.

  3. monitoring Feb 28, 2017, 06:43 PM UTC

    Prompt recordings which are not cached by our telephony provider are unable to be played and may lead to call handling failures. We suggest temporarily switching your campaigns to use text-to-speech. Pixel fires and call recordings are currently delayed. Once S3 comes back online we'll get caught up. You may also experience slowness in our UI since we rely on a couple third-party libraries such as New Relic and Hubspot which we now know are hosted on S3.

  4. monitoring Feb 28, 2017, 08:36 PM UTC

    S3 is back up and we're working on recovery. Webhooks will start firing momentarily.

  5. monitoring Feb 28, 2017, 09:03 PM UTC

    We're still waiting for Amazon to fully recover. Prompt recordings which are not cached by our telephony provider are unable to be played and may lead to call handling failures. We suggest temporarily switching your campaigns to use text-to-speech. Pixel fires and call recordings are currently delayed. Once S3 comes back online we'll get caught up. You may also experience slowness in our UI since we rely on a couple third-party libraries such as New Relic and Hubspot which we now know are hosted on S3.

  6. monitoring Feb 28, 2017, 09:24 PM UTC

    We're still unable to write to S3, so asynchronous webhooks continue to be delayed. Users will be unable to upload audio prompts, download reports, or listen to call recordings until this is resolved. All other systems are go. We'll update this issue shortly.

  7. monitoring Feb 28, 2017, 10:46 PM UTC

    We have resumed firing pixels/webhooks. We'll update this issue as soon as we're caught up.

  8. resolved Feb 28, 2017, 11:20 PM UTC

    We're back and fully recovered. All asynchronous webhooks have been fired.