Boostlingo incident

Connectivity issues in North America

Critical Resolved View vendor source →

Boostlingo experienced a critical incident on July 23, 2023 affecting Boostllingo Voice IVR and Boostlingo Group Rooms and 1 more component, lasting 44m. The incident has been resolved; the full update timeline is below.

Started
Jul 23, 2023, 01:46 PM UTC
Resolved
Jul 23, 2023, 02:30 PM UTC
Duration
44m
Detected by Pingoru
Jul 23, 2023, 01:46 PM UTC

Affected components

Boostllingo Voice IVRBoostlingo Group RoomsBoostlingo Communication REST APIBoostlingo Network Traversal ServiceBoostlingo Speech Recognition

Update timeline

  1. identified Jul 23, 2023, 01:46 PM UTC

    An issue has been identified with a 3rd party provider who is working to address the issue. The most recent update from the 3rd party is below. 23 minutes ago Impact Statement: Starting at 12:30 UTC on 23 Jul 2023, you've been identified as a customer using Azure SignalR Service in West US 2 who may experience connectivity issues and failures with service management operations. Current Status: The third party determined that a recent deployment introduced a configuration error that caused backend components of Azure SignalR to become unhealthy. To address this issue, we have rolled back to a recent deployment containing a successful configuration and we are currently monitoring to ensure this will recover the service. The next update will be provided in 60 minutes, or as events warrant. 1 hour ago Impact Statement: Starting at 12:30 UTC on 23 Jul 2023, you've been identified as a customer using Azure SignalR Service in West US 2 who may experience connectivity issues and failures with service management operations. Current Status: We are aware of this issue and are actively investigating. The next update will be provided within 60 minutes, or as events warrant. 1 hour ago We are actively investigating a service event for Azure SignalR Service in West US 2. More details will be provided shortly.

  2. identified Jul 23, 2023, 02:11 PM UTC

    We are continuing to work on a fix for this issue.

  3. monitoring Jul 23, 2023, 02:12 PM UTC

    The third party has fixed the issue and we are continuing to monitor the situation.

  4. resolved Jul 23, 2023, 02:30 PM UTC

    This incident has been resolved.

  5. postmortem Jul 27, 2023, 03:13 AM UTC

    Between 12:30 UTC and 15:00 UTC on July 23rd 2023 a third party managed websocket service used in Boostlingo went down. The third party determined that an automated OS kernel deployment introduced configuration issues. The third party mitigated the issue by rolling back to the previous OS image version containing a healthy configuration. Boostlingo utilizes this websocket service for one-to-many as well as one-to-one communication, namely from the Boost servers to web and mobile applications. The websocket service is an ideal fit with our on-demand calling functionality due to the fact that clients don't need to "poll" for data from an API, the server is able to "push" down updates to clients. That said, when there is an issue with this service it will impact most real time functionality in web/mobile apps. This issue impacted the ability for interpreters using our apps to receive calls, requestors using our apps to place calls, and impacted push notifications to web/mobile. The last point mentioned could impact things like the async notifications for call log file download and in web toast appointment notifications. At this time we have outstanding support tickets open with this third party for deeper root cause analysis and information on how they intend to prevent disruptions like this in the future. We have also begun some initial investigation on alternatives to this service, in case issues like this occur in the future. Due to most Boost functionality still being responsive and the date/time \(fortunately\) being the most "off peek" hours, monitoring triggers did not fire as quickly as we would have liked. We have refined triggers based on this type of error, so we can be more proactive in reaching out with workarounds. It's important to note that all IVR, Direct Dial, and SIP calls were still able to be placed. If the BPIN was enabled for those accounts, they were also serviced by integrations that do not depend on the websocket service \(ie. the calls would have successfully reached an interpreter\). We did not even consider DAP since the functionality still available in platform far exceeded the capabilities of DAP. Additionally, all onsite appointment functionality and most of the web portal functionality \(other than caller and push notifications mentioned above\) was up the whole time. We are sorry for the inconvenience this caused and for interpreters that were attempting to work during these hours and service calls. We will update this post-mortem when we receive any additional information from the third party managed service Thanks Boost Team