CloudAMQP Outage History
CloudAMQP is up right nowCloudAMQP had 47 outages in the last 2 years totaling 3201h 7m of downtime — averaging 1.9 incidents per month.
There were 47 CloudAMQP outages since December 26, 2024 totaling 3201h 7m of downtime. Each is summarised below — incident details, duration, and resolution information.
Monitoring Systems — False Alarms
Timeline · 2 updates
- monitoring May 30, 2026, 10:25 PM UTC
Between 17:30 and 20:00 UTC on 2026-05-30, our server monitoring system generated a large number of false "server_unreachable" alerts. Clusters were operational and healthy throughout this period — no customer data or message delivery was affected. The alerts were caused by a rolling restart of our monitoring service, during which individual monitoring workers went offline mid-cycle and lost state. This caused the workers to report connection timeouts for servers they could no longer reach — even though those servers were fully healthy. We apologize for any concern this may have caused.
- resolved May 31, 2026, 11:29 AM UTC
This incident has been resolved.
Connectivity issues in Azure West Europe
Timeline · 2 updates
- investigating May 23, 2026, 04:21 PM UTC
We are investigating connectivity issues affecting some CloudAMQP instances in Azure West Europe. Affected instances may see connection failures or unexpected restarts. The root cause is an ongoing Microsoft Azure platform incident in the West Europe region (started 14:31 UTC, 23 May 2026). Microsoft is investigating.
- resolved May 23, 2026, 05:23 PM UTC
Resolved — Microsoft has confirmed that the Azure West Europe Virtual Machines incident is resolved. The impact window was 14:09–14:13 UTC on 23 May 2026, during which a limited number of customers may have experienced connection failures or unexpected Virtual Machine restarts. The Azure environment self-healed and the service has been confirmed restored. Affected CloudAMQP instances should now be operating normally.
System metrics not forwarded to legacy integration
Timeline · 4 updates
Backend slow
Timeline · 4 updates
- investigating May 19, 2026, 01:21 PM UTC
We are currently experiencing issues with our backend services handling account and server creation. This does not affect any running customer servers but might delay provisioning new servers.
- identified May 19, 2026, 01:36 PM UTC
The issue has been identified and a fix is being implemented.
- monitoring May 19, 2026, 01:41 PM UTC
A fix has been implemented and we are monitoring the results.
- resolved May 19, 2026, 01:50 PM UTC
This incident has been resolved.
Connecting issues in Azure West Europe
Timeline · 3 updates
- investigating May 03, 2026, 07:11 PM UTC
We are currently investigating this issue.
- identified May 03, 2026, 07:14 PM UTC
Servers in Azure West Europe can experience connection issues due to underlying issues at provider. Latest update from Azure "Current Status: We continue to investigate an availability issue caused by a subset of unhealthy backend storage infrastructure supporting Azure Virtual Machines and Azure Direct Drive in West Europe."
- resolved May 04, 2026, 01:44 AM UTC
This incident has been resolved.
Backend slow/unresponsible
Timeline · 4 updates
- investigating Apr 28, 2026, 10:27 AM UTC
We are currently experiencing issues with our backend services handling account and server creation. This does not affect any running customer servers but might delay provisioning new servers.
- identified Apr 28, 2026, 10:49 AM UTC
The issue has been identified and a fix is being implemented.
- monitoring Apr 28, 2026, 10:53 AM UTC
A fix has been implemented and we are monitoring the results.
- resolved Apr 28, 2026, 11:05 AM UTC
We had an abnormal increase in traffic causing timeouts and slow responses for customer.cloudamqp.com, we increased capacity and are back to normal. This did not affect any running clusters, just provisioning and configuration updates.
Connection issues in Azure region westus
Timeline · 2 updates
- investigating Apr 24, 2026, 09:36 AM UTC
We are getting indications of instances being down in Azure region westus. We are investigating the issue.
- resolved Apr 24, 2026, 11:12 AM UTC
This incident has been resolved.
Issues with Azure West US 3 region
Timeline · 2 updates
- investigating Apr 23, 2026, 10:10 AM UTC
Servers in Azure West US 3 can experience connection issues due to underlying issues at provider. We monitor and will update when we know more.
- resolved Apr 23, 2026, 10:56 AM UTC
This incident has been resolved.
Metrics integration issues
Timeline · 4 updates
Monitoring Systems Short Outage and "Connection issues" Alarms
Timeline · 1 update
- resolved Apr 15, 2026, 03:32 PM UTC
One of our backend components suffered a small outage due to an unexpected full services restart. Connectivity with the servers were broken and service triggered alarms for many clusters. We apologize for the inconvenience. All clusters were operational and healthy within the incident/alarms timespan.
Management Interface Unavailable - 502 Bad Gateway Response
Timeline · 4 updates
- identified Apr 02, 2026, 02:41 PM UTC
Customers are experiencing Management UI being unavailable with 502 Bad Gateway. Timeline: - 14:16 UTC -> The issue was identified - 14:30 UTC -> Recent code changes were reverted. No relationship with incident noted.
- identified Apr 02, 2026, 02:52 PM UTC
The Management UI and brokers are serving traffic normally. You can still use the Management UI by using the Broker URL and properly typing User and Password. The impact is related to our backend service logs into Management UI via CloudAMQP SSO.
- identified Apr 02, 2026, 03:07 PM UTC
We are continuing to work on a fix for this issue.
- resolved Apr 02, 2026, 03:12 PM UTC
The Management UI SSO shortcut is back to service. Impacted Clusters: All customers with clusters created more than 6 months ago Cause: Some code changes unexpectedly changed the communication schema between our backend services and the SSO component on the clusters. The proxy configuration conflicted with those changes causing the 502 Bad Gateway error to show up. Impact: Only the SSO via CloudAMQP Console was impacted given it required communication with our backend components. The AMQP Clusters were healthy and servicing traffic via both AMQP and HTTP on Management UI and API. Logging into the Management UI via Credentials and all Authentications Backends was functioning normally.
Metrics delivery delay for 12% of the fleet
Timeline · 2 updates
- identified Mar 25, 2026, 05:45 PM UTC
We're currently experiencing issues with metrics delivery, this affects 12% of the API based v1/v2 integrations. We've added additional capacity and expect have handled the backlog within 20 minutes.
- resolved Mar 25, 2026, 06:03 PM UTC
This incident has been resolved.
Queue metrics for legacy integrations delayed
Timeline · 3 updates
- identified Mar 18, 2026, 09:25 PM UTC
We noticed an issue with our queue metrics being sent to metrics integrations. there is currently a delay in processing, but we have identified an issue and expect the delay to be fixed shortly.
- monitoring Mar 18, 2026, 10:16 PM UTC
A fix has been implemented and we are monitoring the results.
- resolved Mar 19, 2026, 03:50 AM UTC
This incident has been resolved.
Coralogix log delivery performance issues
Timeline · 3 updates
- investigating Mar 03, 2026, 09:11 PM UTC
We're currently experiencing issues with Coralogix log delivery.
- monitoring Mar 04, 2026, 11:16 AM UTC
A fix has been implemented and we are monitoring the results.
- resolved Mar 04, 2026, 02:14 PM UTC
This incident has been resolved.
Metrics delivery outage for v3 Prometheus based metrics.
Timeline · 5 updates
- investigating Mar 03, 2026, 06:28 PM UTC
We're currently experiencing issues with metrics delivery, this affects v3 (Prometheus based) metrics.
- identified Mar 03, 2026, 06:47 PM UTC
The issue has been identified and a fix is being implemented.
- monitoring Mar 03, 2026, 07:30 PM UTC
A fix has been implemented and we are monitoring the results.
- resolved Mar 03, 2026, 08:51 PM UTC
This incident has been resolved.
- postmortem Mar 04, 2026, 02:14 PM UTC
### Summary On March 3, an internal monitoring metric stopped being reported for approximately 14 hours. No customer metrics delivery was affected, all integrations continued operating normally. The missing metric caused our dashboards to display false failure rates, which was incorrectly reported as a metrics delivery outage. ### Impact None. Metrics continued to be delivered to all customer endpoints throughout the incident. ### Resolution The monitoring metric was restored and we have updated our alerting queries to be resilient to missing data, preventing false positives in the future.
AWS region me-south-1 outage
Timeline · 3 updates
- monitoring Mar 02, 2026, 05:35 AM UTC
AWS has an ongoing issue with multiple services in region me-south-1. Shared server hummingbird-01 affected.
- monitoring Mar 02, 2026, 08:19 PM UTC
We are still experiencing issues with this region. If your services are impacted, we recommend you create a new cluster in a different region. When creating, use the copy config option, but do not copy plugin settings as this requires the source cluster to be running. Instead, enable once cluster is configured.
- resolved May 05, 2026, 06:55 AM UTC
We are suspending active updates on this status page regarding the AWS Middle East (Bahrain/UAE) outages. We will start supporting those regions again when they have been fully recovered. Reach out to [email protected] if you have any questions.
AWS region me-central-1 outage
Timeline · 6 updates
- monitoring Mar 01, 2026, 01:20 PM UTC
AWS has an ongoing issue with multiple services in region me-central-1.
- monitoring Mar 01, 2026, 01:22 PM UTC
We are continuing to monitor for any further issues.
- monitoring Mar 01, 2026, 01:23 PM UTC
We are continuing to monitor for any further issues.
- monitoring Mar 02, 2026, 01:46 AM UTC
We are continuing to monitor. The outage is in regards to the AZ mec1-az2
- monitoring Mar 02, 2026, 08:19 PM UTC
We are still experiencing issues with all AZ's in this region. If your services are impacted, we recommend you create a new cluster in a different region. When creating, use the copy config option, but do not copy plugin settings as this requires the source cluster to be running. Instead, enable once cluster is configured.
- resolved May 05, 2026, 06:55 AM UTC
We are suspending active updates on this status page regarding the AWS Middle East (Bahrain/UAE) outages. We will start supporting those regions again when they have been fully recovered. Reach out to [email protected] if you have any questions.
Issues with shared server "chameleon"
Timeline · 3 updates
- investigating Feb 17, 2026, 07:31 PM UTC
We are currently investigating this issue.
- monitoring Feb 17, 2026, 09:44 PM UTC
A fix has been implemented and we are monitoring the results.
- resolved Feb 18, 2026, 03:30 PM UTC
This incident has been resolved.
The shared server 'porpoise' is overloaded. We are investigating this!
Timeline · 3 updates
- investigating Feb 03, 2026, 02:47 PM UTC
We are currently investigating this issue.
- monitoring Feb 03, 2026, 03:11 PM UTC
A fix has been implemented and we are monitoring the results.
- resolved Feb 03, 2026, 03:28 PM UTC
This incident has been resolved.
Azure outage
Timeline · 2 updates
- monitoring Feb 03, 2026, 02:53 AM UTC
A known Azure outage is leading to an error that prevents instances being created for now.
- resolved Feb 03, 2026, 08:17 AM UTC
This incident has been resolved by Azure, https://azure.status.microsoft/en-us/status
Datadog metrics issue
Timeline · 4 updates
- investigating Jan 14, 2026, 03:54 PM UTC
We are currently investigating this issue
- identified Jan 14, 2026, 03:55 PM UTC
The issue has been identified and a fix is being implemented
- monitoring Jan 14, 2026, 03:56 PM UTC
We are monitoring now.
- resolved Jan 14, 2026, 04:58 PM UTC
This incident has been resolved.
Email Delays
Timeline · 2 updates
- investigating Dec 11, 2025, 12:16 PM UTC
emails and replies are delayed due to Helscout connection issues https://status.helpscout.com/incidents/kmd031b5fm16,
- resolved Dec 11, 2025, 01:12 PM UTC
This incident has been resolved by third party.
Connectivity issues with Azure westeurope
Timeline · 2 updates
- investigating Dec 10, 2025, 10:13 AM UTC
Servers in west Europe can experience connection issues due to Virtual Machines issues at the provider. We will monitor and update when we know more.
- resolved Dec 10, 2025, 01:13 PM UTC
This incident has been resolved.
Alarms/webhook notification issue
Timeline · 4 updates
- investigating Nov 19, 2025, 03:07 PM UTC
We're currently experiencing issues with alarms notification/webhooks function.
- identified Nov 19, 2025, 03:19 PM UTC
The issue has been identified and a fix is being implemented
- identified Nov 19, 2025, 03:49 PM UTC
We are continuing to work on a fix for this issue.
- resolved Nov 19, 2025, 04:03 PM UTC
This incident has been resolved