Thought Industries incident

Dec 3, 2025 incident

Thought Industries experienced a minor incident on December 3, 2025 affecting US - Platform, lasting 42m. The incident has been resolved; the full update timeline is below.

Started: Dec 03, 2025, 06:15 PM UTC
Resolved: Dec 03, 2025, 06:57 PM UTC
Duration: 42m
Detected by Pingoru: Dec 03, 2025, 06:15 PM UTC

Affected components

US - Platform

Update timeline

investigating Dec 03, 2025, 06:15 PM UTC

We’re aware of an issue that’s currently affecting parts of the platform. Our Engineering team is reviewing the situation and working diligently to resolve it. Updates will be posted here as they become available.
monitoring Dec 03, 2025, 06:24 PM UTC

Between 9:45 AM PST and 10:05 AM PST platform monitoring detected two periods of elevated 503 errors that have self-resolved. We're monitoring the situation as we diagnose the cause.
resolved Dec 03, 2025, 06:57 PM UTC

This incident has been resolved.
postmortem Dec 12, 2025, 01:55 AM UTC

Between 9:45 AM and 10:05 AM PDT on December 3, 2025, users of the US TI platform experienced elevated 503 error rates and increased load times. The issue was caused by congestion in the Rustici postback process, which led to degraded performance and 503 errors for a subset of users. The congestion resulted from a combination of factors: \(1\) instability within internal AWS infrastructure and \(2\) improperly tuned timeout settings for state management during Rustici postback processing. When the platform failed to persist Rustici progress to an internal database—due to network instability or rate limiting—the original request connection remained open longer than intended. When AWS infrastructure instability spiked, a critical accumulation of these hanging requests began to interfere with normal request processing, ultimately impacting non-Rustici platform functionality as well. The infrastructure team has released fixes to both limit the impact of failed requests due to instability and further tuned scaling to ensure similar issues are not caused due to throttling. We apologize for the inconvenience and will continue to monitor the platform to ensure a stable user experience.