CyberSmart incident

Outage Incident - 23/07/19 - Web App Impact

Notice Resolved View vendor source →

CyberSmart experienced a notice incident on August 7, 2019 affecting CyberSmart Apps and CyberSmart Dashboard, lasting —. The incident has been resolved; the full update timeline is below.

Started
Aug 07, 2019, 10:01 AM UTC
Resolved
Aug 07, 2019, 10:01 AM UTC
Duration
Detected by Pingoru
Aug 07, 2019, 10:01 AM UTC

Affected components

CyberSmart AppsCyberSmart Dashboard

Update timeline

  1. resolved Aug 07, 2019, 10:01 AM UTC

    Issue Summary - Total Outage time: ~2.5m hours - All users were unable to access the CyberSmart Web platform due to a 3rd party component failure. - All customers and application HTTP requests to the platform resulted in 502 errors - A third party hosting/services company (Amazon Web Services) experienced an outage in which we have a number of key infrastructure components hosted with. Timeline (GMT) - 16:33 Issue Began - 16:50 Staff were notified of the issue - 19:00: Issue resolved (by external service provider) - 19:03: CyberSmart platform back online Root Cause Amazon AWS had issues with a few of there platform infrastructure services including degraded performance for EBS volumes within the “EU-WEST-2”Region, which is a key part of the RDS component CyberSmart uses for data storage. Resolution and recovery N/A Corrective and Preventative Measures We have planned a work-stream for improved failover within CyberSmart, including using PaaS services distributed over different geographical regions. This will allow automatic corrective measures to keep our services online when a given region has issues.