Treasure Data incident

[EU Region] Elevated error rate and performance degradation for personalization API

Major Resolved View vendor source →

Treasure Data experienced a major incident on January 29, 2025 affecting CDP API and CDP Personalization - Lookup API and 1 more component, lasting 1h 2m. The incident has been resolved; the full update timeline is below.

Started
Jan 29, 2025, 04:13 PM UTC
Resolved
Jan 29, 2025, 05:15 PM UTC
Duration
1h 2m
Detected by Pingoru
Jan 29, 2025, 04:13 PM UTC

Affected components

CDP APICDP Personalization - Lookup APICDP Personalization - Ingest API

Update timeline

  1. investigating Jan 29, 2025, 04:13 PM UTC

    We detected degraded performance of personalization API and an error rate increase. We are currently investigating this issue.

  2. monitoring Jan 29, 2025, 04:49 PM UTC

    We started to apply a remediation and we are observing that the service is recovering. However, we closely monitor the service health status

  3. monitoring Jan 29, 2025, 04:57 PM UTC

    Currently, we can see a lot of improvement in the monitoring of health status. We continue to carefully monitor the health status.

  4. monitoring Jan 29, 2025, 05:01 PM UTC

    We are continuing to monitor for any further issues.

  5. resolved Jan 29, 2025, 05:15 PM UTC

    Between Wednesday, 29 Jan 2025, 15:47 UTC to 16:51 UTC, customers experienced elevated error rates and increased latency related to Profiles API. The cause was a slightly but non-visible elevated error rate monitor kicked a system recovery operation. Then, the recovery operation caused the same incident due to a configuration problem we had on Friday. https://status.treasuredata.com/incidents/jyqjpyscvjzh The response team re-deployed the safe version to recover the system. Also, as a short-term mitigation, we updated the recovery operation until we complete the root cause analysis and permanent fix we described in "Further Actions" in the previoius postmortem: https://status.treasuredata.com/incidents/jyqjpyscvjzh At the moment, if you experience any delays or abnormal errors, please reach out to our support team. Thank you for your patience and understanding during this incident.