Treasure Data incident

[US Region] High Error rate at Custom Script and some DataConnector

Major Resolved View vendor source →

Treasure Data experienced a major incident on July 30, 2024 affecting Data Connector Integrations, lasting 8h. The incident has been resolved; the full update timeline is below.

Started
Jul 30, 2024, 11:52 PM UTC
Resolved
Jul 31, 2024, 07:53 AM UTC
Duration
8h
Detected by Pingoru
Jul 30, 2024, 11:52 PM UTC

Affected components

Data Connector Integrations

Update timeline

  1. identified Jul 30, 2024, 11:52 PM UTC

    We are currently experiencing a high error rate in Custom Script service on Treasure Workflow (US Region) due to an ongoing incident with our infrastructure provider (AWS). This issue is increased error rates with the following error message like: > Task failed with unexpected error: null (Service: AWSLogs; Status Code: 503; Error Code: null; Request ID: xxxxxx; Proxy: null) At this time, we do not have an estimated time for full resolution. We will provide further updates as soon as more information becomes available.

  2. identified Jul 31, 2024, 02:24 AM UTC

    This issue is still ongoing, we are still seeing custom script tasks fail. Custom script user may also encounter some errors about AWS Cloud Watch logs. According to our infrastructure provider (AWS), they are working on recovery and there are some improvements being seen internally, but they expect it to take 1-2 hours for full recovery. We will provide further updates as soon as more information becomes available.

  3. identified Jul 31, 2024, 03:16 AM UTC

    Due to the degradation of Amazon Ads system https://status.ads.amazon.com, our connectors for Amazon Ads platform are currently not working properly. So if you are using any of the below connectors, your jobs may not be running correctly. - Amazon Marketing Cloud export - Amazon Marketing Cloud import - Amazon Ads export - Amazon DSP export We will provide further updates as soon as more information becomes available.

  4. monitoring Jul 31, 2024, 05:25 AM UTC

    According to our infrastructure provider (AWS), this issue has already been resolved. We also see that the failure rate has been reduced, so we will update this incident to Monitoring status and the affected components to Operational status.

  5. resolved Jul 31, 2024, 07:53 AM UTC

    This incident has been resolved, all affected components (Custom Script and some DataConnector) are now back to normal.