Treasure Data Outage History

Treasure Data is up right now

Treasure Data had 50 outages in the last 2 years totaling 113h 25m of downtime — averaging 2.1 incidents per month.

There were 50 Treasure Data outages since June 20, 2024 totaling 113h 25m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.treasuredata.com

Minor August 18, 2025

Treasure Data Insights unavailable

Detected by Pingoru
Aug 18, 2025, 09:03 PM UTC
Resolved
Aug 18, 2025, 09:49 PM UTC
Duration
45m
Affected: InsightsInsightsInsightsInsights
Timeline · 2 updates
  1. investigating Aug 18, 2025, 09:03 PM UTC

    Treasure Data Insights is currently unavailable; we are investigating this issue.

  2. resolved Aug 18, 2025, 09:49 PM UTC

    The issue was resolved; it was linked to a vendor issue. The vendor has resolved the problem that was impacting TD Insights.

Read the full incident report →

Minor July 2, 2025

[AP02 region] Treasure Insights Performance degraded

Detected by Pingoru
Jul 02, 2025, 02:53 AM UTC
Resolved
Jul 02, 2025, 08:03 AM UTC
Duration
5h 10m
Affected: Insights
Timeline · 3 updates
  1. investigating Jul 02, 2025, 02:53 AM UTC

    We are currently investigating a slowdown affecting Treasure Insight on the AP02 site. As a result, you may experience longer than usual processing times, and some requests may time out or fail.

  2. monitoring Jul 02, 2025, 07:17 AM UTC

    We identified the cause of the performance degradation, and we fixed it. We confirmed our metrics are back to normal. We're keeping the monitor.

  3. resolved Jul 02, 2025, 08:03 AM UTC

    This incident has been resolved.

Read the full incident report →

Major June 23, 2025

[All Regions] a Partial Issue with CDP API

Detected by Pingoru
Jun 23, 2025, 06:45 AM UTC
Resolved
Jun 23, 2025, 07:58 AM UTC
Duration
1h 13m
Affected: CDP APICDP APICDP APICDP APICDP API
Timeline · 2 updates
  1. identified Jun 23, 2025, 06:45 AM UTC

    We are currently experiencing an issue with the CDP API concerning the retrieval of realtime attributes. When opening the settings screen or attempting to change settings, realtime attribute values cannot be retrieved, resulting in an error. This issue exclusively affects the realtime segment settings screen and does not impact the actual behavior of realtime segments. Our team is actively working to resolve this issue. We sincerely apologize for any inconvenience or disruption this may cause.

  2. resolved Jun 23, 2025, 07:58 AM UTC

    We are writing to inform you that the issue with realtime attribute retrieval in the CDP API occurred on June 23, 2025, at 04:46 UTC and was resolved on June 23, 2025, at 07:55 UTC. We sincerely apologize for any inconvenience this may have caused.

Read the full incident report →

Minor June 1, 2025

[US region] Ingest API - Performance Degradation

Detected by Pingoru
Jun 01, 2025, 05:18 PM UTC
Resolved
Jun 02, 2025, 05:43 PM UTC
Duration
1d
Affected: Streaming Import REST API
Timeline · 8 updates
  1. investigating Jun 01, 2025, 04:17 PM UTC

    Our Ingest API is experiencing a performance issue. We are investigating the cause.

  2. investigating Jun 01, 2025, 04:35 PM UTC

    Current Status: A fix has been applied and is currently under observation. Impact: An issue was identified with data ingestion(delayed ingestion) between Sunday 8:00 AM PST - 8:30 AM PST. Remediation: A fix has been implemented, and new incoming data is now processing normally. Next Steps: We are actively working to resume sending the affected data, which will arrive out of order along with new incoming data. Updates to Follow: Further details will be provided as the situation progresses.

  3. investigating Jun 01, 2025, 05:18 PM UTC

    Latest Update: Our team has deployed a fix, and the system is now processing the backlog of data at a controlled rate. Estimated Time to Full Recovery: ~7 hours We’re actively monitoring the recovery and will provide further updates as progress continues. Thank you for your patience.

  4. investigating Jun 02, 2025, 01:49 AM UTC

    Latest Update: Our team is still actively monitoring and assessing the processing of the data backlog. Thank you for your continued patience.

  5. monitoring Jun 02, 2025, 04:47 AM UTC

    Latest Update: Our internal graphs show that the overwhelming majority of the data has been processed. There is small amount of residual data that is taking longer than expected to process. We will leave this status page in a Monitoring state until we are certain everything has been processed. Thank you for your continued patience.

  6. monitoring Jun 02, 2025, 04:26 PM UTC

    Latest Update: Our team is still actively monitoring and assessing the processing of the data. We will leave this status page in a Monitoring state until we are certain everything has been processed. Thank you for your continued patience.

  7. monitoring Jun 02, 2025, 05:37 PM UTC

    We are continuing to monitor for any further issues.

  8. resolved Jun 02, 2025, 05:43 PM UTC

    This incident has been resolved. Duration: ~35 minutes of processing delay. Data Ingestion delay between June 1 - 7:35 AM to June 1 - 10:35 AM PST. Affected Customers: All customers ingesting data to AWS US region during the incident window Impact: Delayed data availability in Plazma (up to 40 mins) No data loss occurred

Read the full incident report →

Minor April 15, 2025

[Tokyo Region] Performance Issue of Trino service

Detected by Pingoru
Apr 15, 2025, 07:55 AM UTC
Resolved
Apr 15, 2025, 10:27 AM UTC
Duration
2h 32m
Affected: Presto Query Engine
Timeline · 4 updates
  1. investigating Apr 15, 2025, 07:55 AM UTC

    Our Trino service is experiencing an issue. We are investigating the cause.

  2. identified Apr 15, 2025, 08:09 AM UTC

    The root cause has been identified and we are applying the fix.

  3. monitoring Apr 15, 2025, 08:48 AM UTC

    We are observing recovery. We continue to monitor for full recovery.

  4. resolved Apr 15, 2025, 10:27 AM UTC

    This incident has been resolved.

Read the full incident report →

Critical April 9, 2025

[US Region] Hive Jobs and the Result Export jobs triggered from Hive Jobs are not functioning properly

Detected by Pingoru
Apr 09, 2025, 01:24 AM UTC
Resolved
Apr 09, 2025, 03:22 AM UTC
Duration
1h 58m
Affected: Data Connector IntegrationsHadoop / Hive Query Engine
Timeline · 3 updates
  1. investigating Apr 09, 2025, 01:24 AM UTC

    We are currently investigating an issue where Hive Jobs and the Result Export jobs triggered from Hive Jobs are not functioning properly. Our team is actively looking into the cause of the issue. We will provide updates as soon as more information becomes available.

  2. monitoring Apr 09, 2025, 01:49 AM UTC

    We have identified the root cause and applied a hotfix. The issue was that Hive Jobs were unable to start properly, which in turn caused some Result Export jobs triggered from those Hive Jobs to fail. We are currently monitoring the system to ensure that the situation continues to improve.

  3. resolved Apr 09, 2025, 03:22 AM UTC

    We have confirmed that Hive jobs are now working properly, and the related Result Export jobs are functioning as expected. The impact occurred between April 8, 16:00 UTC and April 9, 01:35 UTC. If you had any jobs that failed during the affected time window, please re-run them as needed. We sincerely apologize for the inconvenience this my have caused.

Read the full incident report →

Notice February 11, 2025

[All Regions] Utilization Dashboards showing outdated information

Detected by Pingoru
Feb 11, 2025, 05:48 PM UTC
Resolved
Feb 11, 2025, 09:47 PM UTC
Duration
3h 58m
Timeline · 4 updates
  1. investigating Feb 11, 2025, 05:48 PM UTC

    Our utilization dashboards do not update with up-to-date information. Customers accessing their Treasure Data usage dashboards will see a gap in usage details from early Saturday morning UTC. There is no impact on ongoing Treasure Data usage, and all usage information is correctly stored internally. However, the dashboard where customers can view their consumption is not up-to-date. Note this is a reporting problem only. There is no indication of any issues with regular Treasure Data usage. We are working to diagnose the issue and will provide an update in the next hour.

  2. identified Feb 11, 2025, 07:05 PM UTC

    We have identified the cause of this issue and are working to restore service. Once the service is restored, we expect it will take a few hours to catch up on usage data for the last few days.

  3. monitoring Feb 11, 2025, 08:34 PM UTC

    We have remediated the issue and are processing usage data from the last 4 days. Users should see the usage dashboards catching up. We expect this process to take about another hour to complete.

  4. resolved Feb 11, 2025, 09:47 PM UTC

    This issue is resolved and our utilization dashboards should be showing up-to-date information. If you observe anything unusual in your usage data, please contact our support team. Thank you for your patience while working through this issue.

Read the full incident report →

Major January 30, 2025

[EU Region] Elevated error rate and performance degradation for personalization API

Detected by Pingoru
Jan 30, 2025, 10:54 AM UTC
Resolved
Jan 30, 2025, 03:43 PM UTC
Duration
4h 49m
Affected: CDP APICDP Personalization - Lookup APICDP Personalization - Ingest API
Timeline · 5 updates
  1. investigating Jan 30, 2025, 10:54 AM UTC

    We detected degraded performance of personalization API and an error rate increase. We are currently investigating this issue.

  2. monitoring Jan 30, 2025, 11:38 AM UTC

    We are currently observing that the performance degradation and error rate have improved. We continue to closely monitor the metrics.

  3. monitoring Jan 30, 2025, 12:31 PM UTC

    We are continuing to monitor for any further issues.

  4. monitoring Jan 30, 2025, 02:18 PM UTC

    We are still monitoring the service. Between Thursday, 30 Jan 2025, 10:00 UTC to 11:05 UTC, customers experienced elevated error rates and longer latency for Profiles API lookup. Currently, the cluster workload has calmed down and is operating normally. Our response team is ready to provision additional processing capacity. However, we are closely monitoring the service status to avoid further downtime during peak times. In addition to it, we are working on isolating problematic accesses from the service. We will keep the status page open and update you on the progress.

  5. resolved Jan 30, 2025, 03:43 PM UTC

    We implemented fundamental isolation to a problematic configuration at 14:42 UTC. The remediation caused the cluster workload to drop from 60% to 1%. On Friday, we implemented write access isolation to the problematic configuration. It stopped the cluster workload from growing. Today, we implemented read access isolation that restored the cluster workload to the previous level. The system is operating normally now. We close the incident. We acknowledge we need further actions to prevent the same incident from happening again by a similar configuration. We will post further postmortem when we are ready.

Read the full incident report →

Minor January 23, 2025

[EU Region] Elevated error/ performance degradation related to personalisation API

Detected by Pingoru
Jan 23, 2025, 07:34 AM UTC
Resolved
Jan 23, 2025, 12:18 PM UTC
Duration
4h 44m
Affected: CDP Personalization - Lookup APICDP Personalization - Ingest API
Timeline · 6 updates
  1. investigating Jan 23, 2025, 07:34 AM UTC

    We are currently observing errors or performance degradation for the personalization API. We are investigating the cause of the issue now.

  2. identified Jan 23, 2025, 08:05 AM UTC

    The response team confirmed the symptom is from the same cause as the previous incidents. We are provisioning additional concurrency capacity to the environment. We will update you when it is completed.

  3. identified Jan 23, 2025, 10:37 AM UTC

    We provisioned additional capacity at 10:00 am UTC to support the increasing workload. It improved the latency, but we still observed errors and long latency for a small amount of requests. The response team started providing another concurrency capacity. Unlike the previous methods, the new process should not take longer for provisioning. We will update the result in 30 minutes.

  4. identified Jan 23, 2025, 11:08 AM UTC

    We successfully provisioned 2x capacity in 30 minutes. New resources improved the latency, but the error rate is still high. The response team is planning to implement another remediation instead of adding resources. We will update you in 30 minutes.

  5. monitoring Jan 23, 2025, 12:16 PM UTC

    The response team found problematic real-time segment configurations of one customer's Parent Segment that possibly contributed to consuming the concurrency capacity. The team updated the real-time event routing configuration to mitigate the high latency issue. Combined with capacity addition operations, the team stabilized the Profiles API cache cluster. If you experience any delays or abnormal errors, please reach out to our support team. Thank you for your patience and understanding during this incident. We will update the postmortem with further remediation plan as promised.

  6. resolved Jan 23, 2025, 12:18 PM UTC

    Between Thursday, 23 Jan 2025 07:20 UTC to 11:40 UTC, customers experienced elevated error rates and increased latency related to Profiles API. A fix has been implemented, and the issue has been resolved. If you experience any delays or abnormal errors, please reach out to our support team. Thank you for your patience and understanding during this incident. We will share an incident retrospective soon.

Read the full incident report →

Critical January 20, 2025

[All Regions] Treasure Insights is experiencing an outage

Detected by Pingoru
Jan 20, 2025, 01:34 PM UTC
Resolved
Jan 20, 2025, 04:04 PM UTC
Duration
2h 30m
Affected: InsightsInsightsInsightsInsights
Timeline · 6 updates
  1. investigating Jan 20, 2025, 01:34 PM UTC

    We have observed that the users are not able to access the Treasure Insights. We are currently investigating the issue.

  2. investigating Jan 20, 2025, 02:17 PM UTC

    We are still investigating the issue.

  3. investigating Jan 20, 2025, 02:49 PM UTC

    The situation remains the same as it was in the last update.

  4. identified Jan 20, 2025, 02:58 PM UTC

    The problem has been identified, and we are currently working on a solution.

  5. monitoring Jan 20, 2025, 03:44 PM UTC

    A fix has been implemented, and we are monitoring the service to ensure everything is functioning correctly.

  6. resolved Jan 20, 2025, 04:04 PM UTC

    We would like to inform you that the issue has been fully resolved. Incident Impact Details: - The Treasure Insights were returning 502 errors and it was unreachable during the incident. Incident Impact Time: - Start: January 20, 09:51 UTC - End: January 20, 15:35 UTC We apologize for any inconvenience this may have caused and thank you for your patience and understanding.

Read the full incident report →

Major January 20, 2025

[EU Region] Elevated error rate for CDP KVS

Detected by Pingoru
Jan 20, 2025, 10:33 AM UTC
Resolved
Jan 20, 2025, 11:41 AM UTC
Duration
1h 8m
Affected: CDP APICDP Personalization - Lookup APICDP Personalization - Ingest API
Timeline · 4 updates
  1. investigating Jan 20, 2025, 10:33 AM UTC

    We are currently investigating this issue.

  2. monitoring Jan 20, 2025, 10:35 AM UTC

    Through our investigation, we identified the cause of the issue and have applied some remediation. Our team is closely monitoring the system to ensure continued stability.

  3. monitoring Jan 20, 2025, 11:10 AM UTC

    We are observing fewer errors now. However, we are still monitoring and re-evaluating the remedial steps to confirm better performance.

  4. resolved Jan 20, 2025, 11:41 AM UTC

    We would like to inform you that the issue has been fully resolved. Incident Impact Details: - Personalization API has experienced an outage leading to increased errors and timeouts. Incident Impact Time: - Start: January 20, 07:45 UTC - End: January 20, 11:15 UTC We apologize for any inconvenience this may have caused and thank you for your patience and understanding.

Read the full incident report →

Minor December 8, 2024

[EU Region] Trino/Presto - Degraded performance

Detected by Pingoru
Dec 08, 2024, 07:48 PM UTC
Resolved
Dec 08, 2024, 08:27 PM UTC
Duration
39m
Affected: Presto Query Engine
Timeline · 3 updates
  1. investigating Dec 08, 2024, 07:48 PM UTC

    Some users may be experiencing degraded performance when running presto or trino jobs. We are investigating the incident. At present all users in the EU central region may be affected.

  2. monitoring Dec 08, 2024, 08:05 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. resolved Dec 08, 2024, 08:27 PM UTC

    This incident has been resolved.

Read the full incident report →

Major November 13, 2024

[All Regions] Elevated error rate for CDP KVS

Detected by Pingoru
Nov 13, 2024, 07:09 AM UTC
Resolved
Nov 13, 2024, 10:16 AM UTC
Duration
3h 7m
Affected: CDP Personalization - Lookup APICDP Personalization - Lookup APICDP Personalization - Lookup APICDP Personalization - Lookup APICDP Personalization - Lookup APICDP Personalization - Ingest APICDP Personalization - Ingest APICDP Personalization - Ingest APICDP Personalization - Ingest APICDP Personalization - Ingest API
Timeline · 3 updates
  1. investigating Nov 13, 2024, 07:09 AM UTC

    Since approximately 4:00 UTC, we have been experiencing an issue with requests to CDP KVS, which may be affecting Profiles API functionality, causing delays in KVS data synchronization and updates to real-time segment information. Our team is actively investigating and working to resolve the issue as quickly as possible. Please note that Realtime 2.0 is not affected.

  2. monitoring Nov 13, 2024, 09:17 AM UTC

    Through our investigation, we identified the cause of the issue as a recent release operation. We have reverted all changes from this release, and normal functionality has been restored. Our team is closely monitoring the system to ensure continued stability.

  3. resolved Nov 13, 2024, 10:16 AM UTC

    We would like to inform you that the issue has been fully resolved. Incident Impact Details: - Profiles API experienced an increased frequency of errors and timeouts. - The latest logs were not reflected in real-time segments. Incident Impact Time by Region: us: - Start: November 13, 04:14 UTC - End: November 13, 08:55 UTC aws-tokyo: - Start: November 13, 04:14 UTC - End: November 13, 08:54 UTC eu01: - Start: November 13, 04:17 UTC - End: November 13, 08:51 UTC ap02 - Start: November 13, 04:15 UTC - End: November 13, 09:01 UTC ap03 - Start: November 13, 04:17 UTC - End: November 13, 08:52 UTC We apologize for any inconvenience this may have caused and thank you for your patience and understanding.

Read the full incident report →

Minor November 5, 2024

[EU region] Presto - Partial Outage

Detected by Pingoru
Nov 05, 2024, 06:24 PM UTC
Resolved
Nov 05, 2024, 07:26 PM UTC
Duration
1h 1m
Affected: Presto Query Engine
Timeline · 4 updates
  1. investigating Nov 05, 2024, 06:24 PM UTC

    We are investigating a possible problem currently causing escalated error rates from presto queries. We will provide an update as soon as we know more.

  2. investigating Nov 05, 2024, 06:49 PM UTC

    We are continuously investigating this issue. For most queries we expect they will succeed after one or more automatic retries

  3. monitoring Nov 05, 2024, 07:05 PM UTC

    We have applied a fix. The problem looks to be resolved, but we are continuing to monitor.

  4. resolved Nov 05, 2024, 07:26 PM UTC

    Between Nov 5, 17:15 UTC and Nov 5, 18:45 UTC, Some customers experienced delays and errors related to presto. The cause was insufficient capacity, which will be investigated further. A fix has been implemented and the issue has been resolved. We apologize for any inconvenience caused. If you have any questions about it, please contact [email protected]

Read the full incident report →

Minor October 16, 2024

[US Region] Trino/Presto performance degradation

Detected by Pingoru
Oct 16, 2024, 12:12 PM UTC
Resolved
Oct 16, 2024, 01:15 PM UTC
Duration
1h 3m
Affected: Presto Query Engine
Timeline · 3 updates
  1. investigating Oct 16, 2024, 12:12 PM UTC

    We are investigating a possible problem currently affecting Trino/Presto queries for the US region. Queries might have degraded performance. We will provide an update as soon as we know more details.

  2. monitoring Oct 16, 2024, 12:41 PM UTC

    We have applied remediation for the degraded performance infrastructure. We are currently monitoring the performance closely.

  3. resolved Oct 16, 2024, 01:15 PM UTC

    This incident has been resolved.

Read the full incident report →

Minor October 2, 2024

[US Region] Query Engine - Service Degraded Performance

Detected by Pingoru
Oct 02, 2024, 12:04 AM UTC
Resolved
Oct 02, 2024, 01:30 AM UTC
Duration
1h 25m
Affected: Streaming Import REST APIMobile/Javascript REST APIData Connector IntegrationsHadoop / Hive Query EnginePresto Query EnginePresto JDBC/ODBC Gateway
Timeline · 8 updates
  1. investigating Oct 02, 2024, 12:04 AM UTC

    We're experiencing an elevated level of API errors and are currently looking into the issue.

  2. investigating Oct 02, 2024, 12:06 AM UTC

    We are continuing to investigate this issue.

  3. investigating Oct 02, 2024, 12:07 AM UTC

    We are continuing to investigate this issue.

  4. identified Oct 02, 2024, 12:41 AM UTC

    The issue has been identified and a fix is being implemented.

  5. monitoring Oct 02, 2024, 01:01 AM UTC

    A fix has been implemented and we are monitoring the results.

  6. monitoring Oct 02, 2024, 01:15 AM UTC

    We are continuing to monitor for any further issues.

  7. resolved Oct 02, 2024, 01:30 AM UTC

    This incident has been resolved.

  8. postmortem Oct 02, 2024, 04:34 AM UTC

    We experienced a temporary overload on the storage layer. It started from 16:15 PDT and fixed on 18:15 PDT. The major impact was performance defgadation for data ingestion components \(Streaming Import REST API, Mobile/Javascript REST API, Data Connector\) and Hive and Presto query engines. Some of queries executed on Hive and Presto failed because of performance degradation of the storage.

Read the full incident report →

Major September 20, 2024

[All Regions] Web Interface - Partial Outage to show Standard Audit Logs

Detected by Pingoru
Sep 20, 2024, 02:20 AM UTC
Resolved
Sep 20, 2024, 06:28 AM UTC
Duration
4h 8m
Affected: Web InterfaceWeb InterfaceWeb InterfaceWeb InterfaceWeb Interface
Timeline · 3 updates
  1. identified Sep 20, 2024, 02:20 AM UTC

    We observed a problem with web console access related to showing Standard Audit Logs. We have found the cause of the incident. We are working to resolve the incident.

  2. monitoring Sep 20, 2024, 05:44 AM UTC

    We confirm the issue was resolved. We will continue to monitor the results.

  3. resolved Sep 20, 2024, 06:28 AM UTC

    This incident has been resolved.

Read the full incident report →

Minor September 4, 2024

[US region] Presto Query Engine - Degraded Performance

Detected by Pingoru
Sep 04, 2024, 10:22 PM UTC
Resolved
Sep 05, 2024, 01:18 PM UTC
Duration
14h 56m
Affected: Presto Query Engine
Timeline · 7 updates
  1. investigating Sep 04, 2024, 10:22 PM UTC

    We are investigating a possible problem currently affecting Presto. Queries could be delayed. We will provide an update as soon as we know more.

  2. monitoring Sep 04, 2024, 11:19 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. investigating Sep 05, 2024, 01:48 AM UTC

    This incident is still ongoing. We are investigating the root cause.

  4. investigating Sep 05, 2024, 03:53 AM UTC

    Though not all, the performance for some queries has been improved. We are continuing to investigate the issue.

  5. monitoring Sep 05, 2024, 04:44 AM UTC

    We applied the fix. We will continue to monitor the results.

  6. monitoring Sep 05, 2024, 08:28 AM UTC

    Systems should be back to normal but we continue to monitor the situation for a while.

  7. resolved Sep 05, 2024, 01:18 PM UTC

    The incident is now resolved. All affected components are back to normal. A subset of customers in the US region might have experienced degraded performance on Presto queries between 4:50 PM EDT and 1:40 AM EDT. Presto queries might also have been queued for longer than usual during the incident. Finally, some queries might have failed due to the remediations that were put in place.

Read the full incident report →

Major August 29, 2024

[Tokyo, AP03 Region] Custom Script Workflow error

Detected by Pingoru
Aug 29, 2024, 10:14 AM UTC
Resolved
Aug 29, 2024, 10:59 AM UTC
Duration
44m
Affected: WorkflowWorkflow
Timeline · 3 updates
  1. investigating Aug 29, 2024, 10:14 AM UTC

    Custom Script from workflow fails due to an ongoing incident with our infrastructure provider (AWS). Error example: Unable to execute HTTP request: Connect to sts.amazonaws.com:443 [sts.amazonaws.com/209.54.177.164] failed: connect timed out com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to sts.amazonaws.com:443 [sts.amazonaws.com/209.54.177.164] failed: connect timed out We are actively working on the issue on our end.

  2. monitoring Aug 29, 2024, 10:30 AM UTC

    The error rate is decreased. Please rerun the failed workflow if needed. We observed errors with the custom script between 8 am and 10 am UTC on August 29th. We keep monitoring the issue carefully.

  3. resolved Aug 29, 2024, 10:59 AM UTC

    Our infrastructure provider (AWS) issue is resolved and we don't observe new errors for now. Please rerun the failed workflow if needed

Read the full incident report →

Minor August 29, 2024

[Tokyo, AP03 Region] Data Connector, Result Export, Hive job malfunction

Detected by Pingoru
Aug 29, 2024, 10:07 AM UTC
Resolved
Aug 29, 2024, 11:00 AM UTC
Duration
53m
Affected: Data Connector IntegrationsData Connector IntegrationsHadoop / Hive Query EngineHadoop / Hive Query Engine
Timeline · 2 updates
  1. investigating Aug 29, 2024, 10:07 AM UTC

    We have found that Data Connector, Result Export, and Hive jobs weren't able to start or failed the job due to an incident with our infrastructure provider (AWS). Some of the Data Connector, ResultExport, and Hive jobs might encounter delay or error. The issue observed on Aug 29th between 8:30 UTC - 9:45 UTC We are still investigating the issue on our end.

  2. resolved Aug 29, 2024, 11:00 AM UTC

    Our infrastructure provider (AWS) issue is resolved and we don't observe new errors for now. Some of the jobs failed due to the incident, so please rerun the failed jobs if needed

Read the full incident report →

Major July 30, 2024

[US Region] High Error rate at Custom Script and some DataConnector

Detected by Pingoru
Jul 30, 2024, 11:52 PM UTC
Resolved
Jul 31, 2024, 07:53 AM UTC
Duration
8h
Affected: Data Connector Integrations
Timeline · 5 updates
  1. identified Jul 30, 2024, 11:52 PM UTC

    We are currently experiencing a high error rate in Custom Script service on Treasure Workflow (US Region) due to an ongoing incident with our infrastructure provider (AWS). This issue is increased error rates with the following error message like: > Task failed with unexpected error: null (Service: AWSLogs; Status Code: 503; Error Code: null; Request ID: xxxxxx; Proxy: null) At this time, we do not have an estimated time for full resolution. We will provide further updates as soon as more information becomes available.

  2. identified Jul 31, 2024, 02:24 AM UTC

    This issue is still ongoing, we are still seeing custom script tasks fail. Custom script user may also encounter some errors about AWS Cloud Watch logs. According to our infrastructure provider (AWS), they are working on recovery and there are some improvements being seen internally, but they expect it to take 1-2 hours for full recovery. We will provide further updates as soon as more information becomes available.

  3. identified Jul 31, 2024, 03:16 AM UTC

    Due to the degradation of Amazon Ads system https://status.ads.amazon.com, our connectors for Amazon Ads platform are currently not working properly. So if you are using any of the below connectors, your jobs may not be running correctly. - Amazon Marketing Cloud export - Amazon Marketing Cloud import - Amazon Ads export - Amazon DSP export We will provide further updates as soon as more information becomes available.

  4. monitoring Jul 31, 2024, 05:25 AM UTC

    According to our infrastructure provider (AWS), this issue has already been resolved. We also see that the failure rate has been reduced, so we will update this incident to Monitoring status and the affected components to Operational status.

  5. resolved Jul 31, 2024, 07:53 AM UTC

    This incident has been resolved, all affected components (Custom Script and some DataConnector) are now back to normal.

Read the full incident report →

Minor June 26, 2024

[US Region] Delays in Processing incoming events

Detected by Pingoru
Jun 26, 2024, 02:43 AM UTC
Resolved
Jun 26, 2024, 03:43 AM UTC
Duration
1h
Affected: Streaming Import REST APIMobile/Javascript REST APICDP Personalization - Ingest APIADL
Timeline · 3 updates
  1. investigating Jun 26, 2024, 02:43 AM UTC

    We are monitoring delays in systems responsible for processing incoming ingested events using our ingestion API. There also increased errors in the ingestion API. The delay is caused by infrastructure issues in our provider, which are currently being addressed. We are monitoring the situation. During this time, writing to storage may be delayed, but there is no evidence of data loss.

  2. monitoring Jun 26, 2024, 02:43 AM UTC

    We are in constant communication with our service provider.

  3. resolved Jun 26, 2024, 03:43 AM UTC

    The issue is resolved at the provider and all components have completed catch-up.

Read the full incident report →

Notice June 25, 2024

[EU region] Profiles API - Degraded Performance

Detected by Pingoru
Jun 25, 2024, 01:28 PM UTC
Resolved
Jun 25, 2024, 01:28 PM UTC
Duration
Affected: CDP Personalization - Lookup APICDP Personalization - Ingest API
Timeline · 1 update
  1. resolved Jun 25, 2024, 01:28 PM UTC

    Between 2:02 a.m. and 5:47 a.m. PDT, the CDP Personalization API experienced elevated API error rates. The engineering team identified the computing instance causing the issue and implemented a fix. The problem has been resolved already. The Personalization API clients that equip error retry observed no issue. We apologize for any inconvenience caused. If you have any questions about it, please contact [email protected]

Read the full incident report →

Notice June 20, 2024

[All Region] All Hive jobs run on Hive4

Detected by Pingoru
Jun 20, 2024, 04:33 AM UTC
Resolved
Jun 20, 2024, 04:52 AM UTC
Duration
18m
Affected: Hadoop / Hive Query EngineHadoop / Hive Query EngineHadoop / Hive Query EngineHadoop / Hive Query EngineHadoop / Hive Query Engine
Timeline · 2 updates
  1. monitoring Jun 20, 2024, 04:33 AM UTC

    All Hive jobs excluding CDP Workflow run on Hive4 (query engine 2023.1) during the following time period. - [US Region] 2024-06-19 07:45 +0000 - 2024-06-20 04:06 +0000 - [Tokyo Region] 2024-06-19 09:05 +0000 - 2024-06-20 04:08 +0000 - [EU Region] 2024-06-19 09:14 - 2024-06-20 04:09 +0000 - [Korea Region] 2024-06-19 09:21 +0000 - 2024-06-20 04:10 +0000 - [AP03 Region] 2024-06-19 09:30 +0000 - 2024-06-20 04:11 +0000 We have fixed it and all Hive jobs are now properly executed on the query engine specified by the user. We apologize for the inconvenience.

  2. resolved Jun 20, 2024, 04:52 AM UTC

    Verified that the issue is completely resolved. We apologize for the inconvenience.

Read the full incident report →