The Things Industries incident

Webhooks failures in the TTSC eu1 cluster

Minor Resolved View vendor source →

The Things Industries experienced a minor incident on October 8, 2025 affecting Europe 1 (eu1), lasting 16h 9m. The incident has been resolved; the full update timeline is below.

Started
Oct 08, 2025, 10:12 AM UTC
Resolved
Oct 09, 2025, 02:21 AM UTC
Duration
16h 9m
Detected by Pingoru
Oct 08, 2025, 10:12 AM UTC

Affected components

Europe 1 (eu1)

Update timeline

  1. investigating Oct 08, 2025, 10:12 AM UTC

    We are investigating reports about webhook failures in the TTSC eu1 cluster.

  2. monitoring Oct 08, 2025, 10:57 AM UTC

    A fix has been implemented and we are monitoring the results.

  3. resolved Oct 09, 2025, 02:21 AM UTC

    This incident has been resolved.

  4. postmortem Oct 13, 2025, 01:57 PM UTC

    On October 7-8, 2025, the webhook ingestion service in our eu1 cluster experienced a degradation that led to an increase in processing errors. This resulted in delayed data delivery due to processing retries and, in a small number of cases, data loss after too many retries occurred. ‌ The issue was caused by a lock that, under certain conditions, could be held indefinitely because it was configured without an automatic expiration. This created a chain reaction where subsequent processes would wait for the lock while holding their Redis connections open. The accumulation of these waiting processes likely caused a rapid increase in Redis connections, which in turn impacted the performance of dependent services. ‌ Service was restored after deploying an update that adjusted the locking logic. To enhance future reliability, our follow-up actions include a review of similar code patterns and performing more heavy load tests on the webhook related services.