Perimeter 81 incident

Some users had intermediate disruption using ZTApps

Minor Resolved View vendor source →

Perimeter 81 experienced a minor incident on February 1, 2022, lasting —. The incident has been resolved; the full update timeline is below.

Started
Feb 01, 2022, 03:40 PM UTC
Resolved
Jan 30, 2022, 04:30 PM UTC
Duration
Detected by Pingoru
Feb 01, 2022, 03:40 PM UTC

Update timeline

  1. resolved Feb 01, 2022, 03:40 PM UTC

    Some users reported an intermediate disruption using ZT-Apps where some of the ZT-Apps failed to connect to the internal resources.

  2. postmortem Feb 01, 2022, 03:40 PM UTC

    **Overview** Due to an unplanned reboot of some of our servers \(that didn’t cause any downtime\), one of the services responsible for our ZT-Apps session management failed to start due to an expired certificate. As our platform is load-balancing the connections, during a period of 20-minutes, some of the ZT-Apps sessions that were routed to the failing services failed to launch some of the ZT-Apps. Some of the users were able to work around the issue by refreshing or re-opening the ZT-Apps that they were trying to reach. ‌ **Detection** Several users reached out to our Support Team that was able to quickly identify the issue and escalate it as a critical issue to our R&D teams. ‌ **Root Cause Analysis** During a swift investigation, our R&D team was able to quickly identify the failing services and find the root cause which was an expired certificate on one of the servers. While we have automatic services that monitor all certificates on all of our servers, services and issue alerts before the certificate expiration dates. Due to an error in configuring those services, the expiration of the ZT-Apps certificates didn’t trigger an alert. ‌ **Resolution** Our team replaced the expired certificate on the server and restart the ZT-Apps service. ‌ **Corrective Actions** Immediate Term - The team manually replaced the expired certificate and restored functionality. Short Term - The team fixed the misconfiguration in our monitoring and alerting services and we’ve verified that all of our servers and services are running with an up-to-date certificate. Long Term - The team is adding an additional monitoring system that will also monitor critical services that fail to initiate.