Integrate.io incident

Intermittent Job Failures and Clusters Stuck on Pending

Notice Resolved View vendor source →

Integrate.io experienced a notice incident on September 1, 2022, lasting —. The incident has been resolved; the full update timeline is below.

Started
Sep 01, 2022, 04:57 AM UTC
Resolved
Sep 01, 2022, 04:57 AM UTC
Duration
Detected by Pingoru
Sep 01, 2022, 04:57 AM UTC

Update timeline

  1. resolved Sep 01, 2022, 04:57 AM UTC

    Beginning at approximately 12:40 AM until 2:30 AM UTC, there was an issue in one of our infrastructure components used for caching which affected clusters and jobs provisioned on the said time period. The issue has now been fixed by our engineers.

  2. postmortem Sep 01, 2022, 04:58 AM UTC

    ### **Issue Summary** From 7:11 AM UTC to 8:58 UTC, there’s an intermittent number of jobs and clusters stuck on pending and errors. ### **Root Cause** The root cause of this outage was due to our Redis component reaching 100% memory which caused the intermittent issues. Redis is used as a caching mechanism of our application. ### **Resolution and recovery** Here are the steps we are taking to ensure that the incident does not happen again moving forward. * Vertically scaled up Redis for more memory. * Improve monitoring so we can quickly detect Redis-related memory issues We appreciate your patience and again apologize for the impact to you, your users, and your organization. We thank you for your business and continued support. Sincerely, [Integrate.io](http://integrate.io/) Engineering