Applicaster incident

Processing actions fails (layouts changes)

Applicaster experienced a notice incident on November 15, 2023 affecting Studio, lasting 1h 52m. The incident has been resolved; the full update timeline is below.

Started: Nov 15, 2023, 02:23 PM UTC
Resolved: Nov 15, 2023, 04:16 PM UTC
Duration: 1h 52m
Detected by Pingoru: Nov 15, 2023, 02:23 PM UTC

Affected components

Studio

Update timeline

investigating Nov 15, 2023, 02:23 PM UTC

We are currently investigating this issue.
identified Nov 15, 2023, 02:55 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Nov 15, 2023, 03:07 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Nov 15, 2023, 04:16 PM UTC

This incident has been resolved.
postmortem Nov 17, 2023, 03:06 PM UTC

There was a leak between the new cluster \(private\) that was created. Sidekiq \(bg jobs processing\) is connected to the same Redis DB on all clusters. The new cluster didn’t have the Zapp app running properly \(pods\) and therefore jobs weren’t processing Once we found the issue, we downscaled the new private cluster to 0, increased node size on AWS console, and restarted the pods on the prod-us1 cluster \(the running production cluster\). In order to prevent it in the future, we need to make sure there is no running nodes on redundant clusters, only the active one. In addition, we should consider Redis DB separation between clusters, although this could cause losing some of the running processes.