Treasure Data incident
[EU Region] Elevated error rate and performance degradation for personalization API
Treasure Data experienced a major incident on January 29, 2025 affecting CDP API and CDP Personalization - Lookup API and 1 more component, lasting 1h 2m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jan 29, 2025, 04:13 PM UTC
We detected degraded performance of personalization API and an error rate increase. We are currently investigating this issue.
- monitoring Jan 29, 2025, 04:49 PM UTC
We started to apply a remediation and we are observing that the service is recovering. However, we closely monitor the service health status
- monitoring Jan 29, 2025, 04:57 PM UTC
Currently, we can see a lot of improvement in the monitoring of health status. We continue to carefully monitor the health status.
- monitoring Jan 29, 2025, 05:01 PM UTC
We are continuing to monitor for any further issues.
- resolved Jan 29, 2025, 05:15 PM UTC
Between Wednesday, 29 Jan 2025, 15:47 UTC to 16:51 UTC, customers experienced elevated error rates and increased latency related to Profiles API. The cause was a slightly but non-visible elevated error rate monitor kicked a system recovery operation. Then, the recovery operation caused the same incident due to a configuration problem we had on Friday. https://status.treasuredata.com/incidents/jyqjpyscvjzh The response team re-deployed the safe version to recover the system. Also, as a short-term mitigation, we updated the recovery operation until we complete the root cause analysis and permanent fix we described in "Further Actions" in the previoius postmortem: https://status.treasuredata.com/incidents/jyqjpyscvjzh At the moment, if you experience any delays or abnormal errors, please reach out to our support team. Thank you for your patience and understanding during this incident.