I.T Communications Limited incident

Service issue

Major Resolved View vendor source →

I.T Communications Limited experienced a major incident on September 6, 2017, lasting 9h 10m. The incident has been resolved; the full update timeline is below.

Started
Sep 06, 2017, 10:46 AM UTC
Resolved
Sep 06, 2017, 07:56 PM UTC
Duration
9h 10m
Detected by Pingoru
Sep 06, 2017, 10:46 AM UTC

Update timeline

  1. identified Sep 06, 2017, 10:46 AM UTC

    We are aware of an Issue affecting PBX-02 on CL2 from working. We are currently working to resolve the issue.

  2. identified Sep 06, 2017, 10:55 AM UTC

    We have discovered a large increase in disk space usage which caused the VMware Node to run out of space. We are currently migrating this server to another VMware Node with more storage which will resolve the issue.

  3. identified Sep 06, 2017, 11:06 AM UTC

    We have made a change to the database server to ensure PBX-03 is working as it should and most phones should be back up and running. PBX-02 is still migrating and is at 29% We will issue a full report ASAP

  4. identified Sep 06, 2017, 11:21 AM UTC

    The issue is also affecting SIP-02. We are waiting for a large amount of data to move over to another server node. however due to the size is taking some time.

  5. identified Sep 06, 2017, 11:28 AM UTC

    Data Migration is at 33% once this is completed. we can restore full service.

  6. identified Sep 06, 2017, 11:42 AM UTC

    Data Migration is at 36% once this is completed. we can restore full service.

  7. identified Sep 06, 2017, 11:59 AM UTC

    Data Migration is at 40% once this is completed. we can restore full service.

  8. identified Sep 06, 2017, 12:26 PM UTC

    Data Migration is at 46% once this is completed. we can restore full service.

  9. identified Sep 06, 2017, 12:56 PM UTC

    We have restored service to for SIP Trunks and currently waiting for the PBX-02 Server to complete the migration. We are also ordering a new 25 Disk Bay Array to give us additional disk storage so we have plenty of space available.

  10. identified Sep 06, 2017, 01:28 PM UTC

    Data Migration is at 57% once this is completed. we can restore full service to PBX-02

  11. identified Sep 06, 2017, 01:43 PM UTC

    We are extremely sorry, but the SIP Trunk Service SIP-02 / SIP-03 is down again due to Disk Space Issues. We are trying to migrate to free up 2TB of space but its taking its time due to the volume of Data. Basically the space is running out quicker then we can move data off. free up 100GB - Service restores and lasts 10 minutes and 100GB gone. Looks like to ensure the service is stable and to prevent data loss, we have no option then to leave the servers powered off until the data migration is complete. This means downtime, however we are doing all we can to resolve this issue. We will be addressing this so this issue never happens again.

  12. identified Sep 06, 2017, 01:49 PM UTC

    Data Migration is at 62% once this is completed. we can restore full service.

  13. identified Sep 06, 2017, 02:59 PM UTC

    Data Migration is at 77% once this is completed. we can restore full service.

  14. identified Sep 06, 2017, 05:22 PM UTC

    Migration is now complete and we are now working on restoring service to PBX-02 and SIP-02

  15. monitoring Sep 06, 2017, 07:08 PM UTC

    We are continuing to monitor the service. We do not expect any further outages. Full Report to follow

  16. resolved Sep 06, 2017, 07:56 PM UTC

    Please see the report detailing the issue today at https://www.it-communicationsltd.co.uk/Outage-Report.pdf