Close experienced a minor incident on March 10, 2025 affecting Dialer, lasting 25m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Mar 10, 2025, 04:02 PM UTC
We've become aware of degraded performance of our Dialer service. We are investigating the issue. Updates will be posted as they become available.
- monitoring Mar 10, 2025, 04:21 PM UTC
We are continuing to investigate the cause of the degraded performance of our Dialer system. Our Dialer system is now functioning normally. We are monitoring performance.
- resolved Mar 10, 2025, 04:28 PM UTC
This incident has been resolved.
- postmortem Mar 12, 2025, 05:39 PM UTC
Close sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact Dialer functionality was impaired for 58 minutes from 15:20 UTC to 16:18 UTC on March 10th 2025. During this time the Dialer feature could get stuck in “connecting” state. ## Root Cause and Resolution The issue was triggered at 15:20 UTC by a service rebalance that caused a number of client connections to close simultaneously. When these clients attempted to reconnect, the sudden spike in traffic that occurred in peak traffic conditions exceeded system limits, leading to service disruptions. Our team quickly identified the cause and worked to stabilize the system. We restored normal operations by 16:18 UTC. To prevent similar incidents in the future, we are reviewing system thresholds and improving our ability to handle sudden increases in demand. ## Timeline * 15:20 UTC - a service rebalance occurs, starting a wave of new connections being established * 15:21 UTC - a portion of requests starts getting dropped due to rate limits * 15:28 UTC - alerts trigger and our response team began identifying the root cause * 15:36 UTC - the rate of dropped requests subsides, but then increases again soon due to a wave-like pattern of retries * 16:18 UTC - final wave of increased errors finishes and situation returns to normal operational levels