Harness incident

Split FME outbound impression integrations are delayed.

Minor Resolved View vendor source →

Harness experienced a minor incident on January 8, 2026 affecting Integrations, lasting 6m. The incident has been resolved; the full update timeline is below.

Started
Jan 08, 2026, 04:40 PM UTC
Resolved
Jan 08, 2026, 04:47 PM UTC
Duration
6m
Detected by Pingoru
Jan 08, 2026, 04:40 PM UTC

Affected components

Integrations

Update timeline

  1. investigating Jan 06, 2026, 09:18 PM UTC

    We are investigating the issue

  2. investigating Jan 06, 2026, 09:26 PM UTC

    For customers who use S3 for impressions integrations we are seeing a partial data delay. The system is currently in recovery mode, and data is being processed and backfilled. No data loss is expected and it is expected to fully recover in 24 hours. For customers that use Webhooks and other third parties (amplitude, segment et al) we are testing mitigations and will come back with an ETA.

  3. identified Jan 06, 2026, 10:07 PM UTC

    For customers that use Webhooks and other third parties (amplitude, segment et al) we are in recovery mode now as well.

  4. identified Jan 07, 2026, 07:57 PM UTC

    Issues with Amplitude, Segment, and custom webhooks are fully resolved as of 9:15pm PT

  5. monitoring Jan 08, 2026, 05:40 PM UTC

    A fix has been implemented and we are monitoring the results.

  6. resolved Jan 09, 2026, 03:57 AM UTC

    This incident has been resolved.

  7. postmortem Jan 12, 2026, 07:09 PM UTC

    ## Summary * Between **Dec 28, 2025 00:04 UTC** and **Jan 8, 2026 11:48 UTC**, impressions integration data experienced delays of varying degrees. * **Amazon S3 integrations** were impacted from Dec 28 through Jan 8, with delays reaching up to 36 hours at peak. * **Amplitude, Segment, and custom webhook integrations** were impacted from Jan 2 through Jan 7, with delays reaching up to 14 hours at peak. * A small number of customers experienced data loss due to rate limiting at their destination during recovery; the vast majority of customers received all their impressions data. ## Root Cause Significant increases in impressions volume caused our integration pipelines to reach their maximum throughput capacity. The S3 integration encountered volume growth that exceeded its processing capacity, while the Amplitude, Segment, and webhook integrations faced similar throughput constraints as traffic continued to increase. ## Impact * Outbound impressions data to S3, Amplitude, Segment, and custom webhook destinations was delayed. * Customers using these integrations would have seen data arrive later than expected. ### What was not impacted? * SDK feature flag evaluations and targeting * FME flag delivery network * Events integrations * Admin API and UI access * Customer flag configuration data ## Remediation For S3 integrations, we reordered and regrouped jobs to prioritize larger integrations, allowing them more time to complete. For Amplitude, Segment, and webhook integrations, we increased throughput through configuration changes within the data pipeline. ## Action Items * **Rebuild webhook integration architecture:** We are implementing a new architecture for Amplitude, Segment, and webhook integrations that provides better isolation from noisy neighbors and higher maximum throughput. * **Improve S3 batch processing:** We are separating batch workloads to prevent a single slow job from delaying others, with prioritization now in place for larger jobs. * **Enhanced monitoring and alerting:** New alerts have been deployed for both systems to ensure engineering teams engage with delays earlier, enabling faster recovery.