CircleLoop incident

Calls Failing

CircleLoop experienced a minor incident on June 28, 2022 affecting Inbound calls and Outbound calls and 1 more component, lasting 6h 10m. The incident has been resolved; the full update timeline is below.

Started: Jun 28, 2022, 09:04 AM UTC
Resolved: Jun 28, 2022, 03:14 PM UTC
Duration: 6h 10m
Detected by Pingoru: Jun 28, 2022, 09:04 AM UTC

Affected components

Inbound callsOutbound callsAPISMS

Update timeline

identified Jun 28, 2022, 09:04 AM UTC

We have identified an issue affecting both incoming and outgoing calls. The cause has been identified and a fix is being implemented currently. Apologies for any inconvenience caused, we are working to resolve this as soon as possible.
identified Jun 28, 2022, 09:20 AM UTC

Thanks for your patience. A fix is still in the process of being implemented and therefore call performance is still degraded. Please subscribe for further updates.
identified Jun 28, 2022, 09:37 AM UTC

Our deepest apologies for this inconvenience. A fix is still being implemented and performance is still degraded. We will remain to keep you updated in the meantime.
identified Jun 28, 2022, 09:49 AM UTC

We are continuing to work on a fix for this issue.
identified Jun 28, 2022, 09:51 AM UTC

Thank you for your ongoing patience. Unfortunately, the issue remains unchanged and performance is still degraded. We will continue to update you as the incident develops.
identified Jun 28, 2022, 10:03 AM UTC

Thanks for your patience. We are moving closer to a fix and you may begin to see improved performance. We can confirm that only app-based users are currently affected and those using SIP devices such as deskphones or ATA's should be unaffected. Apologies again for the inconvenience. Another update will follow shortly.
monitoring Jun 28, 2022, 10:21 AM UTC

The fix has been implemented and we are monitoring the results. Call performance appears to now be improved but bear in mind it still be partially degraded.
monitoring Jun 28, 2022, 11:06 AM UTC

Thanks for you patience. Call operations appear to be mostly back to normal as the incident remains to be monitored. We will update those concerned with a full post-mortem when the incident is deemed completely resolved.
monitoring Jun 28, 2022, 11:37 AM UTC

Calls appear to be fully operational but we are still monitoring the issue and identifying the overall cause. A full post-mortem will be posted once the issue is deemed completely resolved.
resolved Jun 28, 2022, 03:14 PM UTC

This incident is now resolved. A full post mortem will follow shortly.
postmortem Jun 29, 2022, 08:42 AM UTC

**Reason for Outage** **Incident Date: 28th Jun 2022 Incident Time: 09:46 AM - 10:48 AM** **Services Affected** Inbound & Outbound calls to the CircleLoop web applications. Impact Inbound & Outbound call failures for the majority of CircleLoop customers using the Windows/Mac & Mobile applications. SIP Devices were unaffected. **Notification** The first sign of any issue was an alert via the CircleLoop monitoring system at 9:48AM, at this point in time calls were intermittently functioning. Subsequently, the CircleLoop Operations team and customers reported issues using the applications to make or receive calls from 10.05AM, at this time calls were now consistently failing to connect. **Diagnosis & Cause** Upon investigation it was determined that both two components of CircleLoop platform were unhealthy, with their application processes restarting and failing on a continual basis. The root cause of this was the deployment of a routine change to the CircleLoop platform. This had the unforeseen consequence of causing an error in the Live Services component of the platform, which began returning 400 responses to all requests. SIP device users were unaffected as they do not use Live Services in their call flows. **Resolution** Several attempts were made to restore service while the incident was progressing, initially rebooting the Live Services component which seemed successful, but quickly reverted to an unhealthy state. The issue was resolved by restoring the previous configuration, bringing Live Services back to a healthy state. **Mitigation** The logic in Live Services has now been made more robust and has been tested to ensure it gracefully handles errors, expected or otherwise, to ensure the issue does not reoccur.