Customer.io incident

Delayed Workspace Performance

Minor Resolved View vendor source →

Customer.io experienced a minor incident on October 23, 2025 affecting Data Processing and Message Sending, lasting 1h 34m. The incident has been resolved; the full update timeline is below.

Started
Oct 23, 2025, 09:26 PM UTC
Resolved
Oct 23, 2025, 11:00 PM UTC
Duration
1h 34m
Detected by Pingoru
Oct 23, 2025, 09:26 PM UTC

Affected components

Data ProcessingMessage Sending

Update timeline

  1. investigating Oct 23, 2025, 09:26 PM UTC

    We are currently investigating this issue. You may see delays in data processing and message sending.

  2. investigating Oct 23, 2025, 10:08 PM UTC

    We are continuing to investigate this issue.

  3. investigating Oct 23, 2025, 10:40 PM UTC

    We are continuing to investigate this issue.

  4. monitoring Oct 23, 2025, 10:51 PM UTC

    A fix has been implemented and we are monitoring the results.

  5. resolved Oct 23, 2025, 11:00 PM UTC

    This incident has been resolved.

  6. postmortem Oct 28, 2025, 08:04 PM UTC

    **Incident Summary** On October 23rd, 2025, between 20:36 UTC and 23:31 UTC customers in both the US and EU regions may have experienced delays in data processing, message sending and user interface sending state notifications. The issue did not affect data ingestion and no data was lost. During routine maintenance, the engineering team deployed database schema updates intended to support future feature, reliability and performance improvements. These updates unintentionally increased operational load on several database servers, slowing their response until the updates completed or were manually stopped. **Root Cause** An update to the database schemas was performed across all databases, resulting in a modification to an existing database table. Due to the number of databases and the frequency of updates, the database engines became overwhelmed managing the internal metadata changes. The update lacked throttling or short delays between operations, which would have reduced load and prevented performance degradation. **Resolution and Recovery** Service performance returned to normal once the schema updates finished. One database server was proactively restarted to expedite full recovery. Following stabilization, the team identified the update responsible and confirmed no residual impact or ongoing risk. **Corrective and Preventative Measures** To prevent recurrence, the team is enhancing the schema update process to include built-in throttling and guardrails that limit database load. Code templates will be updated accordingly, and review processes will highlight the need for these safeguards. Monitoring improvements are also being implemented to better detect early signs of database strain and alert engineers sooner. These measures are being tracked and prioritized within our internal development process.