Calibre incident

Temporary test unavailability and 1.5 hrs of downtime

Major Resolved View vendor source →

Calibre experienced a major incident on December 2, 2018, lasting —. The incident has been resolved; the full update timeline is below.

Started
Dec 02, 2018, 05:15 PM UTC
Resolved
Dec 02, 2018, 05:15 PM UTC
Duration
Detected by Pingoru
Dec 02, 2018, 05:15 PM UTC

Update timeline

  1. resolved Dec 10, 2018, 09:58 AM UTC

    This incident has been resolved.

  2. postmortem Dec 10, 2018, 09:58 AM UTC

    On December 3rd, [Calibreapp.com](http://calibreapp.com) suffered approximately **1 hour 30 minutes of downtime following difficulties during a routine data migration** followed by a period of degraded performance. During the data migration, tests recorded prior to December 3rd were temporarily unavailable to view. New tests were being conducted, but delayed in aggregation due to the ongoing data migration and also temporarily unavailable. **No data was lost.** ## **Monday 3rd December, 4:25pm AEST** 45 minutes into the data migration we noticed drastically degraded Postgres database performance, which brought [Calibreapp.com](http://calibreapp.com) down for almost an hour. ## **Monday 3rd December, 5:30pm AEST** [Calibreapp.com](http://calibreapp.com) was brought back up while still experiencing degraded performance due to the migration load. ## **Monday 3rd December, 9:30pm AEST** A routine vacuum and automatic daily database backups started running and operating on the same table that was being migrated, which caused further issues. ## **Tuesday 4th December, 8:00am AEST** By Tuesday the migration had progressed to process data back to September 2018, which meant that timeline metrics were 100% available, but detailed reports of those tests were still unavailable for view. We continued to monitor the database. ## Wednesday 5th December, 7:27pm AEST Following numerous process efficiency fixes and replacing a database replica the remaining queue backlog was processed smoothly and the service came back to full availability.