Treasure Data incident

[US Region] Ingest API - Performance Downgrade

Minor Resolved View vendor source →

Treasure Data experienced a minor incident on August 6, 2024 affecting Streaming Import REST API and CDP Personalization - Ingest API, lasting 22h 50m. The incident has been resolved; the full update timeline is below.

Started
Aug 06, 2024, 03:32 PM UTC
Resolved
Aug 07, 2024, 02:23 PM UTC
Duration
22h 50m
Detected by Pingoru
Aug 06, 2024, 03:32 PM UTC

Affected components

Streaming Import REST APICDP Personalization - Ingest API

Update timeline

  1. investigating Aug 06, 2024, 03:32 PM UTC

    Our Ingest API is experiencing a performance issue. We are investigating the cause.

  2. investigating Aug 06, 2024, 04:04 PM UTC

    We are still investigating the cause of this issue.

  3. investigating Aug 06, 2024, 05:21 PM UTC

    We are observing slower processing time for messages sent to our Ingest API. Users may see a delay up to two hours in message processing. We are continuing to investigate the root cause and exploring options to catch up on our backlog of messages, and will provide an update once we know more.

  4. identified Aug 06, 2024, 05:41 PM UTC

    We have identified the source of the problem and are applying a solution now. Customers may still see processing delays as we catch up on the request backlog. We will continue to explore options to accelerate our recovery, and we will continue to monitor the situation.

  5. monitoring Aug 06, 2024, 06:18 PM UTC

    We have rolled out a fix and observed that processing delays are no longer increasing. Customers may continue to see delayed message processing over the next 3-4 hours as the backlog is processed. We continue exploring options to shorten this time and will monitor for any issues.

  6. monitoring Aug 07, 2024, 01:12 AM UTC

    We are continuing to monitor our systems' recovery as we work through the backlog of messages sent in the last 12 hours. We have added more resources to reduce the impact of this issue. At this time, we expect all messages to be processed, but customers may continue to see multi-hour delays as we continue to process messages to our Ingest API for the next few hours. We will continue to monitor this issue, and we appreciate your patience as we work through it.

  7. monitoring Aug 07, 2024, 01:09 PM UTC

    We are continuing to monitor for any further issues.

  8. monitoring Aug 07, 2024, 01:13 PM UTC

    We are continuing to monitor the recovery. As of now, 99% of events become visible within 45 minutes. We will resolve the incident when the catch-up is complete.

  9. monitoring Aug 07, 2024, 01:15 PM UTC

    We are continuing to monitor for any further issues.

  10. resolved Aug 07, 2024, 02:23 PM UTC

    We confirmed the catch-up is complete at 6:44 am PT. From 2024-08-06 03:20 am to 2024-08-07 06:44 am PT, the events arrived at us01.records.in.treasuredata.com and c360-ingest-api.treasuredata.com experienced maximum 8 hours of delay in batch data ingestion. There was no impact in real-time system.