Sentinel incident

Post-dial delay and audio issues on softphones and handsets

Sentinel experienced a minor incident on January 14, 2026 affecting Telephone System, lasting 17h 5m. The incident has been resolved; the full update timeline is below.

Started: Jan 14, 2026, 04:08 PM UTC
Resolved: Jan 15, 2026, 09:14 AM UTC
Duration: 17h 5m
Detected by Pingoru: Jan 14, 2026, 04:08 PM UTC

Affected components

Telephone System

Update timeline

identified Jan 14, 2026, 04:08 PM UTC

We are investigating an issue with mobile, desktop applications and handsets being unresponsive.
identified Jan 14, 2026, 04:08 PM UTC

We are continuing to work on a fix for this issue.
monitoring Jan 14, 2026, 04:20 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Jan 15, 2026, 09:14 AM UTC

This incident has been resolved.
postmortem Jan 22, 2026, 12:02 PM UTC

**Incident Analysis:** At around 11:15am UTC on 12th January, performance of the Memory Cache layer on one of our zones began to degrade, and eventually caused the above services to drop. Dashboard and softphone applications would also have been impacted, as parts of these also pull information such as user availability. As we began to reroute traffic for affected customers into healthy zones, we saw the same issue present itself on a second zone, leaving one healthy zone without any interruption to service. To mitigate the impact, we implemented a forced-flush of the system at around 13:00pm UTC, and managed to restore service to one zone shortly afterward. The second affected zone did not recover with the same method. Our infrastructure team then rolled out a planned upgrade to our SIP proxies to that affected zone, at around 13:30pm, which dramatically reduced the load and allowed all services to resume on the final zones within minutes. We took the decision to wait to implement this same upgrade across the other two healthy zones out of hours, however we unfortunately saw the same issue occur later in the day which caused us to roll this out early to a second zone, at around 15:55pm UTC. The fix was implemented and service restored within 10 minutes of the first report in this instance. Later that evening, we successfully rolled out the upgrade to the remaining zone. Unfortunately, as a result of these upgrades, this caused the BLF indicators on some handsets to become stuck. The following day, on the 13th January at 12:47pm, our team implemented a flush of all active BLF connections, which caused an unforeseen, temporary interruption to those accounts. Service resumed automatically for all handsets at around 12:57pm. **Conclusion and Next Steps** After reviewing the data surrounding this incident, we are confident that the catalyst for the initial degradation on the affected zones is related to the ever-increasing load on the platform. While we have made significant progress in many areas since beginning this overhaul in July 2025, we recognise that there is still some way to go before it can be considered fully complete. At the time of writing, all zones now have the relevant upgrade applied, which gives us a permanent solution to this issue: we have now fully retired the legacy module that caused yesterday’s issues. In addition to this upgrade to our Memory Cache layer, we are in the final stages of QA for an additional major project that involves the complete re-design of our presence infrastructure, again dramatically reducing the load and enabling far greater scalability in future.