Elastic.io incident

Investigating Issues with Serverless Project Creation and Scaling in GCP Region us-central1

Elastic.io experienced a major incident on July 27, 2025 affecting Deployment orchestration (Create/Edit/Restart/Delete): GCP europe-west1 and Deployment orchestration (Create/Edit/Restart/Delete): GCP us-central1 and 1 more component, lasting 4d 15h. The incident has been resolved; the full update timeline is below.

Started: Jul 27, 2025, 04:49 PM UTC
Resolved: Aug 01, 2025, 08:11 AM UTC
Duration: 4d 15h
Detected by Pingoru: Jul 27, 2025, 04:49 PM UTC

Affected components

Deployment orchestration (Create/Edit/Restart/Delete): GCP europe-west1Deployment orchestration (Create/Edit/Restart/Delete): GCP us-central1Elastic Cloud Serverless

Update timeline

investigating Jul 27, 2025, 06:32 AM UTC

We are currently investigating an issue that is primarily affecting new deployment creations for some customers on Google Cloud Platform. Existing deployments remain operational; however, there is a potential for impact to scaling operations which we are actively investigating. Our engineering teams are working to identify the root cause and restore full functionality. We will provide a further update within 60 minutes, or sooner if we have significant new information.
investigating Jul 27, 2025, 07:53 AM UTC

We have identified the root cause of the issue affecting deployment creation and scaling on Google Cloud Platform, which is related to a recent automated infrastructure update from our cloud provider. A mitigation plan has been identified, and our teams are coordinating to apply it across the affected clusters. We have also taken steps to prevent further impact. We will provide another update within 60 minutes with more details on the timeline for resolution.
investigating Jul 27, 2025, 09:35 AM UTC

We have a candidate fix in a testing environment where we have successfully applied it to the first set of affected clusters. We have confirmed that project creation and scaling are now functioning. The rollout of the fix to the remaining impacted environments is ongoing, and we will continue to monitor the situation closely. We will provide our next update within 60 minutes, or as soon as we have a significant development.
monitoring Jul 27, 2025, 10:05 AM UTC

A fix has been implemented across all affected clusters, and we keep monitoring the situation closely. Initial observations indicate that services related to project creation and scaling are returning to normal operation. We will continue to monitor the situation for stability and provide an update if there are any new developments.
monitoring Jul 27, 2025, 04:49 PM UTC

A fix has been implemented across all affected clusters, and we keep monitoring the situation closely. Initial observations indicate that services related to project creation and scaling are returning to normal operation. We will continue to monitor the situation for stability and provide an update if there are any new developments.
monitoring Jul 27, 2025, 05:45 PM UTC

A fix has been implemented across all affected clusters, and we keep monitoring the situation closely. Initial observations indicate that services related to project creation and scaling are returning to normal operation. We will continue to monitor the situation for stability and provide an update if there are any new developments.
investigating Jul 27, 2025, 11:04 PM UTC

We have identified additional impact of this issue impacting Serverless project creation and scaling in additional GCP regions. Our engineering team is currently working to implement and roll out a fix to mitigate the issue. We will provide an additional update in 60 minutes or as soon as we have an additional development.
monitoring Jul 28, 2025, 12:15 AM UTC

The impact of this issue to customers has been mitigated. Our engineering team is actively monitoring the situation and deploying safeguards to ensure there is no additional impact. We will provide an additional update in 60 minutes or as soon as we have an additional development.
monitoring Jul 28, 2025, 01:02 AM UTC

The impact of this issue to customers remains mitigated. Changes to safeguard against additional impact have been deployed. We are continuing to monitor the situation and will provide an update if there are any status changes.
resolved Aug 01, 2025, 08:11 AM UTC

This incident has been resolved.