2600Hz incident

We are investigating reports of inbound and outbound calls failing on ZSwitch

Major Resolved View vendor source →

2600Hz experienced a major incident on October 13, 2022 affecting Telephony Services, lasting 1h 1m. The incident has been resolved; the full update timeline is below.

Started
Oct 13, 2022, 04:27 PM UTC
Resolved
Oct 13, 2022, 05:29 PM UTC
Duration
1h 1m
Detected by Pingoru
Oct 13, 2022, 04:27 PM UTC

Affected components

Telephony Services

Update timeline

  1. investigating Oct 13, 2022, 04:27 PM UTC

    We are investigating reports of inbound and outbound calls failing on ZSwitch

  2. identified Oct 13, 2022, 04:48 PM UTC

    We have restarted the freeswitch servers and believe the platform is functional again. We have identified what we believe is the root cause and are working with engineering to validate this and get an emergency build out to ZSwitch now

  3. monitoring Oct 13, 2022, 04:58 PM UTC

    We had to again restart freeswitch, but believe we have verified the root cause and taken steps to mitigate it. We are now monitoring the situation and engineering is working on a proper fix.

  4. resolved Oct 13, 2022, 05:29 PM UTC

    A hot-patch has been deployed to ZSwitch to prevent re-occourance

  5. postmortem Oct 17, 2022, 04:59 PM UTC

    Following the Zswitch upgrade to 5.1 you likely experienced downtime between 9:20am – 10:40am PT. This was due to a bug in the new version of Freeswitch; if it encountered any presence ID's with spaces Freeswitch would experience a crash. There was at least one account with every device configured this way which explains the almost total FS downtime. Only one call would need to be placed on the Freeswitch server to cause a crash. Our monitoring picked up these crashes and our operations team began to investigate with the help of engineering. After around 15 minutes we had identified the issue. Operations began to disable any accounts with a space in the presence ID as a short term workaround while the engineering team worked on a hot patch to ecallmanager to strip the spaces out of presence ID strings prior to passing to Freeswitch. The hot patch was installed onto the Freeswitch servers at around 10:35am PT where we confirmed the issue was resolved. To prevent a re-occurrence of this issue on future releases we have added an additional step into our QA testing; to confirm all special characters/spaces in presence ID's do not interact in an unexpected way with Freeswitch. If you have any further questions on the downtime please don't hesitate to contact us. We will be more than happy to provide further information if required.