Assignar incident

Storage issue with RDS

Minor Resolved View vendor source →

Assignar experienced a minor incident on April 9, 2019, lasting —. The incident has been resolved; the full update timeline is below.

Started
Apr 09, 2019, 05:30 AM UTC
Resolved
Apr 09, 2019, 05:30 AM UTC
Duration
Detected by Pingoru
Apr 09, 2019, 05:30 AM UTC

Update timeline

  1. resolved Apr 11, 2019, 06:07 AM UTC

    We had an issue with one of our Database instances that was causing some clients to not be able to log into the system. We identified an issue and the database was running low on FreeLocalStorage due to some complex SQL queries that were being executed. Also our extensive Audit Logs have added to this issue. We have provisioned more storage to this database instance and have optimised our logging to consume less local storage on the database instance.

  2. postmortem Apr 11, 2019, 06:09 AM UTC

    ## We apologise for Intermittent issues At Assignar, we do our very best to ensure that our customers don’t experience any service interruptions. Unfortunately, we had some issues with one of our RDS instances that prevented some users to log into Assignar and view certain pages in the dashboard and the mobile app. For that, we are sincerely sorry. ## What Happened? After taking a deep look at the intermittent database issues, we identified that one of our RDS instances was running low on FreeLocalStorage. Instances in our database clusters have two types of storage: Storage for persistent data \(called the cluster volume\). This storage type increases automatically when more space is required. Local storage for each instance in the cluster, based on the instance class. This storage type and size is bound to the instance class, and can be changed only by moving to a larger DB instance class. Our database clusters use local storage for storing error logs, general logs, slow query logs, audit logs, and non-InnoDB temporary tables. The following error was identified: "The free storage capacity for DB Instance: instance-name is low at x% of the provisioned storage \[Provisioned Storage: xx GB, Free Storage: xx GB\]. You may want to increase the provisioned storage to address this issue." We increased the provisioned storage to and the problem was rectified. ## **Remediation plan** We have a number of alarms in place to prevent incidents like this to occur. Unfortunately, we didn’t have any alarms for FreeLocalStorage readings. We have put in place appropriate alerts and alarms so that we get plenty of notice when such problems are likely to occur. We have also fine tuned our audit logging, which should consume a lot less local storage on the database instance, hence slow down the consumption of FreeLocalStorage on the RDS instance.