Amazee Outage History

Amazee degraded · 2 active incidents View live status →

Amazee had 39 outages in the last 2 years totaling 3609h 23m of downtime — averaging 1.6 incidents per month.

There were 39 Amazee outages since September 10, 2024 totaling 3609h 23m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.amazee.io

Notice June 1, 2026

Delayed log ingestion

Detected by Pingoru
Jun 01, 2026, 02:07 PM UTC
Resolved
Jun 01, 2026, 05:33 PM UTC
Duration
3h 25m
Affected: Lagoon Logs (Kibana)
Timeline · 2 updates
  1. monitoring Jun 01, 2026, 02:07 PM UTC

    There is a delay in log processing and recent logs for events generated since June 1st may not be available. Real time logs can be access through the Lagoon CLI: https://docs.amazee.io/cloud/logging/#real-time-container-logs-via-lagoon-cli

  2. resolved Jun 01, 2026, 05:33 PM UTC

    This incident has been resolved.

Read the full incident report →

Minor May 7, 2026

Degraded: cert-manager unavailable - TLS certificate issuance affected

Detected by Pingoru
May 07, 2026, 06:08 PM UTC
Resolved
May 07, 2026, 09:14 PM UTC
Duration
3h 6m
Affected: jp1.lagoonca1.lagoonie1.lagoonDeployment Infrastructurede3.lagoonfi2.lagoonau2.lagoonus2.lagoonuk3.lagoonus3.lagoonAMERAPACEMEAch4.lagoonamazee.ai infrastructure: au103amazee.ai infrastructure: de-eu101amazee.ai infrastructure: de103amazee.ai infrastructure: uk103amazee.ai infrastructure: us103amazee.ai infrastructure: us104amazee.ai infrastructure: ch103amazee.ai infrastructure
Timeline · 3 updates
  1. identified May 07, 2026, 06:08 PM UTC

    We are currently experiencing an issue with cert-manager in affected clusters. The cert-manager controller pod is failing to start due to an image pull failure from quay.io (502 Bad Gateway), which is an external registry. Impact: New TLS certificates cannot be issued or renewed while cert-manager is unavailable. Existing certificates remain valid until their expiry. Services relying on automatic certificate provisioning may be affected if certificates expire during this window. https://status.redhat.com/

  2. monitoring May 07, 2026, 06:24 PM UTC

    All clusters are able to pull cert-manager image, we are currently monitoring statuspage https://status.redhat.com/

  3. resolved May 07, 2026, 09:14 PM UTC

    This incident has been resolved.

Read the full incident report →

Minor April 30, 2026

CVE-2026-31431 / Copy Fail

Detected by Pingoru
Apr 30, 2026, 01:33 AM UTC
Resolved
May 13, 2026, 06:01 AM UTC
Duration
13d 4h
Affected: jp1.lagoonca1.lagoonie1.lagoonde3.lagoonfi2.lagoonau2.lagoonus2.lagoonuk3.lagoonus3.lagoonAMERAPACEMEAch4.lagoon
Timeline · 2 updates
  1. identified Apr 30, 2026, 01:33 AM UTC

    We are aware of CVE-2026-31431 / "Copy Fail" and have completed an initial assessment. We have not identified evidence of customer impact, service compromise, or active exploitation at this stage. We are prioritizing Linux kernel upgrades and will provide further updates if our assessment changes or if maintenance becomes necessary.

  2. resolved May 13, 2026, 06:01 AM UTC

    Mitigations at the Linux kernel level have been rolled out across all regions.

Read the full incident report →

Minor April 29, 2026

High error rate on Fastly Management API

Detected by Pingoru
Apr 29, 2026, 09:19 AM UTC
Resolved
Apr 29, 2026, 12:51 PM UTC
Duration
3h 31m
Affected: CDN
Timeline · 4 updates
  1. investigating Apr 29, 2026, 09:19 AM UTC

    We are currently investigating a high error rate on the Fastly management API. This impacts deployments of new domains that are routed through Fastly. Existing setups on Fastly or traffic routing are not impacted.

  2. investigating Apr 29, 2026, 09:58 AM UTC

    Fastly is investigating the API issues and updating their status page at: https://www.fastlystatus.com/incident/378511 Impact: Deployments with new domains that are routed through Fastly will continue to work, but the actual domain setup in Fastly will be delayed.

  3. monitoring Apr 29, 2026, 11:15 AM UTC

    Fastly resolved the issue with their management API and we'll keep monitoring the situation.

  4. resolved Apr 29, 2026, 12:51 PM UTC

    This incident has been resolved.

Read the full incident report →

Notice April 21, 2026

Degraded Performance

Detected by Pingoru
Apr 21, 2026, 12:09 PM UTC
Resolved
Apr 21, 2026, 01:03 PM UTC
Duration
53m
Affected: de3.lagoonuk3.lagoon
Timeline · 2 updates
  1. investigating Apr 21, 2026, 12:09 PM UTC

    We are currently investigating this issue.

  2. resolved Apr 21, 2026, 01:03 PM UTC

    This incident has been resolved.

Read the full incident report →

Notice April 15, 2026

Post-maintenance Rollout

Detected by Pingoru
Apr 15, 2026, 04:19 AM UTC
Resolved
Apr 15, 2026, 07:24 AM UTC
Duration
3h 5m
Affected: de3.lagoon
Timeline · 3 updates
  1. identified Apr 15, 2026, 04:19 AM UTC

    There was an issue with an EKS rollout during maintenance, we are actively addressing this issue.

  2. monitoring Apr 15, 2026, 06:16 AM UTC

    A fix has been implemented and we are monitoring the results.

  3. resolved Apr 15, 2026, 07:24 AM UTC

    This incident has been resolved.

Read the full incident report →

Notice April 7, 2026

Increased Build Failures

Detected by Pingoru
Apr 07, 2026, 04:18 PM UTC
Resolved
Apr 08, 2026, 09:54 PM UTC
Duration
1d 5h
Affected: Deployment Infrastructure
Timeline · 3 updates
  1. identified Apr 07, 2026, 04:18 PM UTC

    We have identified an issue causing Lagoon builds to fail and are actively working to resolve.

  2. monitoring Apr 07, 2026, 05:27 PM UTC

    We have applied a fix for this issue and are monitoring deployments.

  3. resolved Apr 08, 2026, 09:54 PM UTC

    The incident has been resolved.

Read the full incident report →

Minor March 18, 2026

Increased Lagoon API error rate

Detected by Pingoru
Mar 18, 2026, 01:41 PM UTC
Resolved
Mar 19, 2026, 02:52 PM UTC
Duration
1d 1h
Affected: Lagoon APILagoon Dashboard
Timeline · 3 updates
  1. investigating Mar 18, 2026, 01:41 PM UTC

    We are investigating increased error rates on the Lagoon API.

  2. identified Mar 19, 2026, 03:27 AM UTC

    The issue has been identified and a fix is being implemented.

  3. resolved Mar 19, 2026, 02:52 PM UTC

    This incident has been resolved.

Read the full incident report →

Critical March 17, 2026

Lagoon API slow or degraded

Detected by Pingoru
Mar 17, 2026, 06:42 AM UTC
Resolved
Mar 17, 2026, 08:48 PM UTC
Duration
14h 5m
Affected: Lagoon API
Timeline · 4 updates
  1. investigating Mar 17, 2026, 06:42 AM UTC

    We are currently investigating this issue.

  2. identified Mar 17, 2026, 07:45 AM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Mar 17, 2026, 07:45 AM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved Mar 17, 2026, 08:48 PM UTC

    This incident has been resolved.

Read the full incident report →

Notice February 18, 2026

Changed processing of HTTP 425 responses in Google Chrome

Detected by Pingoru
Feb 18, 2026, 07:01 AM UTC
Resolved
Apr 08, 2026, 08:10 AM UTC
Duration
49d 1h
Affected: CDN
Timeline · 2 updates
  1. monitoring Feb 18, 2026, 07:01 AM UTC

    In the latest release of Google Chrome (v145) the handling of HTTP 425 responses was changed. Should you see this affecting your sites, please reach out to our support and we'll look into applying mitigations for your specific use-case. Related status page from Fastly: https://www.fastlystatus.com/incident/378300#

  2. resolved Apr 08, 2026, 08:10 AM UTC

    This incident has been resolved.

Read the full incident report →

Minor February 9, 2026

Elevated Errors on Fastly level

Detected by Pingoru
Feb 09, 2026, 07:38 PM UTC
Resolved
Feb 10, 2026, 07:52 AM UTC
Duration
12h 13m
Affected: CDN
Timeline · 2 updates
  1. monitoring Feb 09, 2026, 07:38 PM UTC

    Fastly has reported elevated errors affecting Fastly Applications. Existing certificates do not appear to be impacted; however, some services are experiencing issues when adding new domains. We are monitoring the Fastly status page: https://www.fastlystatus.com/

  2. resolved Feb 10, 2026, 07:52 AM UTC

    This incident has been resolved.

Read the full incident report →

Notice January 30, 2026

Intermittent SSH access issues

Detected by Pingoru
Jan 30, 2026, 02:59 PM UTC
Resolved
Feb 02, 2026, 02:29 PM UTC
Duration
2d 23h
Affected: Lagoon API
Timeline · 3 updates
  1. investigating Jan 30, 2026, 02:59 PM UTC

    We're investigating intermittent SSH access to cloud clusters

  2. monitoring Jan 30, 2026, 03:58 PM UTC

    We have identified and mitigated the cause for the SSH access issues, and are currently monitoring the portal.

  3. resolved Feb 02, 2026, 02:29 PM UTC

    This incident has been resolved.

Read the full incident report →

Notice January 28, 2026

HTTP Connection Errors

Detected by Pingoru
Jan 28, 2026, 08:25 PM UTC
Resolved
Jan 29, 2026, 06:45 AM UTC
Duration
10h 20m
Affected: us2.lagoon
Timeline · 2 updates
  1. monitoring Jan 28, 2026, 08:25 PM UTC

    We found an HTTP connection errors issue. The underlying issue was resolved, and we are actively monitoring the status.

  2. resolved Jan 29, 2026, 06:45 AM UTC

    This incident has been resolved.

Read the full incident report →

Minor January 12, 2026

Increase in failed deployments

Detected by Pingoru
Jan 12, 2026, 11:05 AM UTC
Resolved
Jan 13, 2026, 07:08 AM UTC
Duration
20h 2m
Affected: Deployment Infrastructure
Timeline · 3 updates
  1. investigating Jan 12, 2026, 11:05 AM UTC

    We have seen an increase in failed deployments and are currently investigating the cause for this.

  2. identified Jan 12, 2026, 11:33 AM UTC

    The issue has been identified and a fix is being implemented.

  3. resolved Jan 13, 2026, 07:08 AM UTC

    This incident has been resolved.

Read the full incident report →

Major December 1, 2025

Logging partially unavailable

Detected by Pingoru
Dec 01, 2025, 07:06 AM UTC
Resolved
Dec 03, 2025, 06:22 AM UTC
Duration
1d 23h
Affected: Lagoon Logs (Kibana)
Timeline · 3 updates
  1. identified Dec 01, 2025, 07:06 AM UTC

    The logging infrastructure is rebalancing due to the new month starting which causes some requests to fail.

  2. monitoring Dec 02, 2025, 08:19 AM UTC

    There is a backlog of logs to be ingested into the system. While this is happening in the background, some recent logs might not be visible in the logging system right away.

  3. resolved Dec 03, 2025, 06:22 AM UTC

    The backlog has been processed and the logging system is fully available again.

Read the full incident report →

Major November 16, 2025

CH4: Workloads unable to start

Detected by Pingoru
Nov 16, 2025, 05:32 PM UTC
Resolved
Nov 16, 2025, 11:21 PM UTC
Duration
5h 49m
Affected: ch4.lagoon
Timeline · 4 updates
  1. investigating Nov 16, 2025, 05:32 PM UTC

    New workloads on CH4 are unable to start. Deployments are expected to fail currently.

  2. investigating Nov 16, 2025, 06:54 PM UTC

    We have escalated the issue to the infrastructure provider and are working with them to resolve the issue.

  3. monitoring Nov 16, 2025, 09:16 PM UTC

    We applied a fix which restored all functionality around the starting of workloads and doing deployments.

  4. resolved Nov 16, 2025, 11:21 PM UTC

    This incident has been resolved.

Read the full incident report →

Major November 14, 2025

Logging inaccessible

Detected by Pingoru
Nov 14, 2025, 07:56 AM UTC
Resolved
Nov 14, 2025, 11:40 AM UTC
Duration
3h 44m
Affected: Lagoon Logs (Kibana)
Timeline · 3 updates
  1. investigating Nov 14, 2025, 07:56 AM UTC

    We are currently investigating this issue.

  2. identified Nov 14, 2025, 08:18 AM UTC

    There was an unexpected rise in load on the logging system. We are now stabilizing the different components.

  3. resolved Nov 14, 2025, 11:40 AM UTC

    This incident has been resolved.

Read the full incident report →

Minor November 3, 2025

Connection interruption for CH4 database

Detected by Pingoru
Nov 03, 2025, 11:27 AM UTC
Resolved
Nov 03, 2025, 02:11 PM UTC
Duration
2h 44m
Affected: ch4.lagoon
Timeline · 3 updates
  1. investigating Nov 03, 2025, 11:27 AM UTC

    We have received multiple reports of connection interruptions when using the production databases on CH4. We are investigating the cause of these interruptions.

  2. monitoring Nov 03, 2025, 11:44 AM UTC

    Since these issues are related to elevated resource usage we are doing an emergency resource increase. We expect a short interruption for existing connection which will help to prevent further disruptions.

  3. resolved Nov 03, 2025, 02:11 PM UTC

    This incident has been resolved.

Read the full incident report →

Major November 3, 2025

Logs access degraded

Detected by Pingoru
Nov 03, 2025, 06:24 AM UTC
Resolved
Nov 04, 2025, 06:46 AM UTC
Duration
1d
Affected: Lagoon Logs (Kibana)
Timeline · 3 updates
  1. investigating Nov 03, 2025, 06:24 AM UTC

    We have identified a problem with logs ingestion and access. Engineers are investigating. We will provide an update within the next 90 minutes.

  2. investigating Nov 03, 2025, 07:52 AM UTC

    The logging system is working through a backlog of ingested logs that need to be allocated and indexed. While the backlog is being worked on, more data will become available over time. As the indexing is a resource intensive task, some requests can occasionally result in an error.

  3. resolved Nov 04, 2025, 06:46 AM UTC

    Due to the large backlog of logs, some log messages from 2025-11-01 to 2025-11-03 could not be loaded into the new logging system and had to be dropped. These logs are however still available in the old logging system at https://kibana-legacy.amazeeio.cloud.

Read the full incident report →

Minor October 20, 2025

Limited workload scaling and deployments

Detected by Pingoru
Oct 20, 2025, 09:34 AM UTC
Resolved
Oct 20, 2025, 02:02 PM UTC
Duration
4h 27m
Affected: jp1.lagoonca1.lagoonie1.lagoonDeployment Infrastructurede3.lagoonfi2.lagoonau2.lagoonus2.lagoonuk3.lagoonus3.lagoonAMERAPACEMEAch4.lagoon
Timeline · 3 updates
  1. monitoring Oct 20, 2025, 09:34 AM UTC

    Due to an upstream issue within AWS some workloads could not be scaled or deployed. The issue also affects our support system which is only partially available.

  2. monitoring Oct 20, 2025, 11:17 AM UTC

    Some workloads in us2.lagoon are blocked from starting up due to the upstream issue. Other regions are no longer affected by this.

  3. resolved Oct 20, 2025, 02:02 PM UTC

    We are no longer seeing workloads affected by this issue.

Read the full incident report →

Critical September 25, 2025

Lagoon API Outage

Detected by Pingoru
Sep 25, 2025, 12:28 AM UTC
Resolved
Sep 25, 2025, 01:34 AM UTC
Duration
1h 6m
Affected: Lagoon APILagoon Dashboard
Timeline · 3 updates
  1. investigating Sep 25, 2025, 12:28 AM UTC

    The Lagoon UI and API are currently experiencing an issue, we're investigating.

  2. monitoring Sep 25, 2025, 01:05 AM UTC

    We experienced an issue with image pulls during maintenance that resulted in the prolonged outage. This has been resolved now and we will continue to monitor.

  3. resolved Sep 25, 2025, 01:34 AM UTC

    This incident has been resolved.

Read the full incident report →

Major September 22, 2025

Unresponsive workloads on UK3

Detected by Pingoru
Sep 22, 2025, 11:50 AM UTC
Resolved
Sep 22, 2025, 03:05 PM UTC
Duration
3h 15m
Affected: uk3.lagoon
Timeline · 3 updates
  1. investigating Sep 22, 2025, 11:50 AM UTC

    Some workloads on UK3 are not responding as expected due to high load across a part of the compute infrastructure.

  2. monitoring Sep 22, 2025, 12:01 PM UTC

    The situation has been stabilized and we continue to monitor the system closely

  3. resolved Sep 22, 2025, 03:05 PM UTC

    This incident has been resolved.

Read the full incident report →

Major July 16, 2025

Delayed log processing

Detected by Pingoru
Jul 16, 2025, 02:55 PM UTC
Resolved
Jul 17, 2025, 08:52 AM UTC
Duration
17h 56m
Affected: Lagoon APILagoon Logs (Kibana)
Timeline · 5 updates
  1. investigating Jul 16, 2025, 02:55 PM UTC

    There is a delay in log processing and recent logs are not available in the logging system. Real time logs can be access through the Lagoon CLI: https://docs.amazee.io/cloud/logging/#real-time-container-logs-via-lagoon-cli

  2. identified Jul 16, 2025, 04:02 PM UTC

    We are rolling out a potential solution which can cause connectivity issues with the Lagoon API.

  3. monitoring Jul 16, 2025, 04:14 PM UTC

    While the logs are being processed the logging system can be unavailable at times.

  4. monitoring Jul 17, 2025, 04:59 AM UTC

    The processing of some older logs is still ongoing.

  5. resolved Jul 17, 2025, 08:52 AM UTC

    The historical logs have been processed and are available in the logging stack.

Read the full incident report →

Notice July 8, 2025

DNS issues

Detected by Pingoru
Jul 08, 2025, 11:31 AM UTC
Resolved
Jul 08, 2025, 01:41 PM UTC
Duration
2h 10m
Affected: Deployment InfrastructureNameservers
Timeline · 3 updates
  1. identified Jul 08, 2025, 11:31 AM UTC

    The issue has been identified and a fix is being implemented.

  2. monitoring Jul 08, 2025, 12:24 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. resolved Jul 08, 2025, 01:41 PM UTC

    During the rollout of enabling DNS-01 challenges in our infrastructure, the image registry was unavailable due to temporary DNS misconfiguration.

Read the full incident report →