CleverTap incident
Some of our nodes are facing issues. Some customers might face delay/5xx errors in dashboard. API endpoint is also facing issues.
CleverTap experienced a major incident on February 3, 2021 affecting API Ingestion and Dashboard Reports and 1 more component, lasting 13h 52m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Feb 03, 2021, 05:42 PM UTC
We are currently facing a degraded performance on one of the dashboard nodes. Some customers might face delay/5xx errors while accessing the EU dashboard.
- investigating Feb 03, 2021, 06:01 PM UTC
We are continuing to investigate this issue.
- identified Feb 03, 2021, 06:18 PM UTC
The issue is identified and we are fixing it by recycling nodes and connections. Few users may still face issues while login.
- identified Feb 03, 2021, 08:26 PM UTC
Update: We are still working on fixing on the issues and experiencing the partial or major outage for some of the nodes. We will update this space with an update shortly.
- identified Feb 03, 2021, 09:09 PM UTC
Update: There are no changes to the previous status as of yet. We are still all hands on deck and continuing to get the services back online. We will continue to share updates every 30 minutes until the issue is resolved.
- identified Feb 03, 2021, 09:52 PM UTC
Update: We are deploying a quick fix to the affected nodes and relevant modules. This will take some time as the new deployment has to be rolled out for the cluster. We will continue to share the updates here.
- identified Feb 03, 2021, 11:07 PM UTC
Update: We are still working on the deployment across clusters and recycling the nodes. We will continue to share the updates here.
- identified Feb 04, 2021, 01:41 AM UTC
Update: We deployed the hotfix on all the instances and recycled the instances. However, the fix has not worked as expected. We are working further on this issue and treating this as an AHOD (all hands on deck). We will continue to update the progress here.
- identified Feb 04, 2021, 04:12 AM UTC
Update: We are bringing the components back online in a staggered manner. We are observing the entire process and anomalies if any. We are expecting the service to come back to normal in the next couple of hours. We will keep updating the progress here.
- monitoring Feb 04, 2021, 06:43 AM UTC
We really appreciate your patience and very sorry for the disruption. All the services are back up online and we are currently monitoring them. The accounts and services are catching up on our platform. Few campaigns + reports might take some time to commence. We are currently monitoring the situation and will update if anything is missing.
- resolved Feb 04, 2021, 07:35 AM UTC
This incident has been resolved and we have monitored it for a while. Sincere apologies for the disruption in the service.