I.T Communications Limited experienced a major incident on September 6, 2017, lasting 9h 10m. The incident has been resolved; the full update timeline is below.
Update timeline
- identified Sep 06, 2017, 10:46 AM UTC
We are aware of an Issue affecting PBX-02 on CL2 from working. We are currently working to resolve the issue.
- identified Sep 06, 2017, 10:55 AM UTC
We have discovered a large increase in disk space usage which caused the VMware Node to run out of space. We are currently migrating this server to another VMware Node with more storage which will resolve the issue.
- identified Sep 06, 2017, 11:06 AM UTC
We have made a change to the database server to ensure PBX-03 is working as it should and most phones should be back up and running. PBX-02 is still migrating and is at 29% We will issue a full report ASAP
- identified Sep 06, 2017, 11:21 AM UTC
The issue is also affecting SIP-02. We are waiting for a large amount of data to move over to another server node. however due to the size is taking some time.
- identified Sep 06, 2017, 11:28 AM UTC
Data Migration is at 33% once this is completed. we can restore full service.
- identified Sep 06, 2017, 11:42 AM UTC
Data Migration is at 36% once this is completed. we can restore full service.
- identified Sep 06, 2017, 11:59 AM UTC
Data Migration is at 40% once this is completed. we can restore full service.
- identified Sep 06, 2017, 12:26 PM UTC
Data Migration is at 46% once this is completed. we can restore full service.
- identified Sep 06, 2017, 12:56 PM UTC
We have restored service to for SIP Trunks and currently waiting for the PBX-02 Server to complete the migration. We are also ordering a new 25 Disk Bay Array to give us additional disk storage so we have plenty of space available.
- identified Sep 06, 2017, 01:28 PM UTC
Data Migration is at 57% once this is completed. we can restore full service to PBX-02
- identified Sep 06, 2017, 01:43 PM UTC
We are extremely sorry, but the SIP Trunk Service SIP-02 / SIP-03 is down again due to Disk Space Issues. We are trying to migrate to free up 2TB of space but its taking its time due to the volume of Data. Basically the space is running out quicker then we can move data off. free up 100GB - Service restores and lasts 10 minutes and 100GB gone. Looks like to ensure the service is stable and to prevent data loss, we have no option then to leave the servers powered off until the data migration is complete. This means downtime, however we are doing all we can to resolve this issue. We will be addressing this so this issue never happens again.
- identified Sep 06, 2017, 01:49 PM UTC
Data Migration is at 62% once this is completed. we can restore full service.
- identified Sep 06, 2017, 02:59 PM UTC
Data Migration is at 77% once this is completed. we can restore full service.
- identified Sep 06, 2017, 05:22 PM UTC
Migration is now complete and we are now working on restoring service to PBX-02 and SIP-02
- monitoring Sep 06, 2017, 07:08 PM UTC
We are continuing to monitor the service. We do not expect any further outages. Full Report to follow
- resolved Sep 06, 2017, 07:56 PM UTC
Please see the report detailing the issue today at https://www.it-communicationsltd.co.uk/Outage-Report.pdf