The Things Industries incident
Webhooks failures in the TTSC eu1 cluster
The Things Industries experienced a minor incident on October 8, 2025 affecting Europe 1 (eu1), lasting 16h 9m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 08, 2025, 10:12 AM UTC
We are investigating reports about webhook failures in the TTSC eu1 cluster.
- monitoring Oct 08, 2025, 10:57 AM UTC
A fix has been implemented and we are monitoring the results.
- resolved Oct 09, 2025, 02:21 AM UTC
This incident has been resolved.
- postmortem Oct 13, 2025, 01:57 PM UTC
On October 7-8, 2025, the webhook ingestion service in our eu1 cluster experienced a degradation that led to an increase in processing errors. This resulted in delayed data delivery due to processing retries and, in a small number of cases, data loss after too many retries occurred. The issue was caused by a lock that, under certain conditions, could be held indefinitely because it was configured without an automatic expiration. This created a chain reaction where subsequent processes would wait for the lock while holding their Redis connections open. The accumulation of these waiting processes likely caused a rapid increase in Redis connections, which in turn impacted the performance of dependent services. Service was restored after deploying an update that adjusted the locking logic. To enhance future reliability, our follow-up actions include a review of similar code patterns and performing more heavy load tests on the webhook related services.