Qubole experienced a minor incident on March 9, 2022 affecting Site Availability and Command Processing and 1 more component, lasting 4d 6h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Mar 09, 2022, 10:05 PM UTC
api.qubole.com is currently seeing some degraded performance while processing commands and UI. At this time issue appears to be intermittent.
- identified Mar 10, 2022, 01:19 AM UTC
DevOps had identified that there is an issue with memcache and redis in api.qubole.com. Devops team is investigating further.
- identified Mar 10, 2022, 11:26 AM UTC
Our internal team has resolved the issue with worker nodes. The team is working on auto scaling of the nodes under scheduler ELB.
- identified Mar 10, 2022, 04:18 PM UTC
The scheduler tier on api.q is not allowing to create a new instances. Devops team has created a new non-vpc ASG and trying to add and scale up scheduler nodes. Once it is done we could switch from existing ASG to the new ASG.
- identified Mar 10, 2022, 08:31 PM UTC
New ASG has been created under classic and added a couple of nodes and those are serving traffic. Scheduler nodes and the connectivity issues between worker and memcache are fixed. Now Devops try to run a sample jobs and observing the stability.
- identified Mar 11, 2022, 03:56 AM UTC
New ASG has been created under classic and added a couple of nodes and those are serving traffic. Now DevOps is working on the Redis connection issue and also, still they are working on the root cause of this issue to resolve it.
- monitoring Mar 11, 2022, 01:47 PM UTC
Overall api.q environment seems to be stabilizing. Devops team is continuously monitoring the environment
- monitoring Mar 11, 2022, 05:23 PM UTC
Overall, the api.qubole.com environment seems to be stabilizing. DevOps team is continuing to resolve issues for specific individual customers.
- monitoring Mar 11, 2022, 08:42 PM UTC
Overall, the environment 'api.qubole.com' seems to be stabilizing. DevOps team is continuing to resolve issues for specific individual customers.
- monitoring Mar 11, 2022, 11:32 PM UTC
Overall, the 'api.qubole.com' environment seems to be stabilizing. DevOps team is continuing to resolve issues for specific individual customers.
- monitoring Mar 12, 2022, 04:59 AM UTC
DevOps team is trying to resolve this issue as soon as possible as they are continuing to resolve issues for specific individual customers.
- monitoring Mar 12, 2022, 08:45 AM UTC
DevOps team is continuously trying to resolve this issue as soon as possible as they are working on individual customers to resolve it.
- monitoring Mar 12, 2022, 11:58 AM UTC
DevOps team is continuously trying to resolve this issue as soon as possible as they are working on individual customers to resolve it.
- identified Mar 12, 2022, 03:35 PM UTC
Devops has identified a secondary issue with scheduler autoscaling that is contributing to the remaining intermittent issues. They are currently working to resolve the autoscaling issue.
- identified Mar 12, 2022, 06:32 PM UTC
DevOps team is actively working on it and they have identified a secondary issue with scheduler autoscaling that is contributing to the remaining intermittent issues. They are currently working to resolve the autoscaling issue.
- identified Mar 12, 2022, 09:34 PM UTC
DevOps team is actively working on it and they have identified a secondary issue with scheduler autoscaling that is contributing to the remaining intermittent issues. They are currently working to resolve the autoscaling issue.
- identified Mar 13, 2022, 12:43 AM UTC
DevOps team has identified the cause of the issue as scheduler autoscaling that is contributing to the remaining intermittent issues. They are currently working to resolve it.
- identified Mar 13, 2022, 04:53 AM UTC
DevOps team is still working on the Scheduler issue with individual customers and trying to resolve it.
- identified Mar 13, 2022, 08:08 AM UTC
DevOps team is actively working on the Clusters and Scheduler issue with individual customers and checking with the customers and trying to resolve it soon.
- identified Mar 13, 2022, 12:11 PM UTC
DevOps team is actively working on the issue with individual customers and checking with the customers and trying to resolve it at the earliest.
- identified Mar 13, 2022, 05:11 PM UTC
DevOps team is actively working on the issue and checking with the customers and trying to resolve it at the earliest.
- identified Mar 13, 2022, 08:34 PM UTC
DevOps team is actively working on the issue and checking with the customers and trying to resolve it at the earliest.
- identified Mar 13, 2022, 11:30 PM UTC
DevOps team is actively working on the issue and checking with the customers and trying to resolve it at the earliest.
- identified Mar 14, 2022, 02:04 AM UTC
DevOps team is actively working on the issue and checking with the customers and trying to resolve it at the earliest.
- resolved Mar 14, 2022, 04:56 AM UTC
The issue has been resolved.