Sauce Labs Outage History

Sauce Labs is up right now

Sauce Labs had 50 outages in the last 2 years totaling 24h 20m of downtime — averaging 2.1 incidents per month.

There were 50 Sauce Labs outages since October 30, 2024 totaling 24h 20m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.saucelabs.com

Notice May 21, 2026

2026-May-13 Resolved Service Incident

Detected by Pingoru
May 21, 2026, 02:53 PM UTC
Resolved
May 21, 2026, 02:53 PM UTC
Duration
Timeline · 1 update
  1. resolved May 21, 2026, 02:53 PM UTC

    Between May 13th 18:47 UTC and May 15th 9:23 UTC, we experienced test details not appearing in the dashboard of the US-East Data Center. This issue has been resolved. All services are fully operational.

Read the full incident report →

Major April 24, 2026

2026-April-23 Resolved Service Incident

Detected by Pingoru
Apr 24, 2026, 04:33 PM UTC
Resolved
Apr 24, 2026, 04:33 PM UTC
Duration
Timeline · 2 updates
  1. resolved Apr 24, 2026, 04:33 PM UTC

    Between April 23rd 22:44 and April 24th 15:25 UTC, there was a technical issue that affected video recordings for tests running on macOS 15 and iOS within our EU and US-West Data Center. We identified the issue and deployed a fix. All systems are now fully operational.

  2. postmortem May 13, 2026, 03:05 PM UTC

    ### **Dates:** Thursday, April 23rd 2026, 22:43 UTC - Friday, April 24th 2026, 15:29 UTC ### **What happened:** Video assets were missing for virtual iOS simulator tests on ARM and macOS ARM desktop tests in the US-West and EU data centers. ### **Why it happened:** A product defect was introduced resulting in a screen capture failure. ### **How we fixed it:** We performed a rollback to a stable version. ### **What we are doing to prevent it from happening again:** We are improving monitoring & alerting to enhance our post deployment validation.

Read the full incident report →

Notice April 16, 2026

2026-April-16 Resolved Service Incident

Detected by Pingoru
Apr 16, 2026, 10:10 AM UTC
Resolved
Apr 16, 2026, 10:10 AM UTC
Duration
Timeline · 2 updates
  1. resolved Apr 16, 2026, 10:10 AM UTC

    Between 02:00 and 11:15 CEST, live and automated tests on iOS 17.0 simulators were failing to start in the EU and US-West Data Center. We executed a deployment rollback, which restored services. All systems are now fully operational.

  2. postmortem May 08, 2026, 06:42 PM UTC

    ### **Dates:** Thursday, April 16th 2026, 00:00 UTC – 09:15 UTC ### **What happened:** Live and automated tests on iOS 17.0 simulators failed to start in both the EU and US-West data centers. Customers running tests on iOS 17.0 Intel-based simulators were unable to execute their tests for approximately 9 hours. ### **Why it happened:** A deployment introduced an incompatibility affecting iOS 17.0 on Intel-based infrastructure. The issue was not caught prior to release due to insufficient post-deployment test coverage for that specific simulator configuration. ### **How we fixed it:** We performed a rollback to the previous deployment, which restored full iOS 17.0 simulator functionality. ### **What we are doing to prevent it from happening again:** We are reviving and expanding automated post-deployment tests to cover a broader range of simulator configurations, including legacy Intel-based iOS versions, to catch incompatibilities before they reach production.

Read the full incident report →

Major April 7, 2026

2026-April-07 Service Incident

Detected by Pingoru
Apr 07, 2026, 03:02 PM UTC
Resolved
Apr 07, 2026, 04:13 PM UTC
Duration
1h 10m
Affected: US-WestUS-WestUS-WestUS-WestEU-CentralEU-CentralEU-CentralEU-Central
Timeline · 3 updates
  1. investigating Apr 07, 2026, 03:02 PM UTC

    We are currently investigating reports of test failures affecting users running tests using SauceCtl in our US-West-1 and EU-Central-1 Data Center. We are investigating.

  2. resolved Apr 07, 2026, 04:13 PM UTC

    We have identified the root cause and have deployed a fix for this issue. All services are fully operational.

  3. postmortem Apr 10, 2026, 09:56 PM UTC

    ### **Dates:** Monday April 7th 2026, ~11:00 – 15:55 UTC ### **What happened:** Some customers experienced 503 errors when running tests via saucectl. The test-composer service was intermittently unavailable, preventing framework-based test execution. ### **Why it happened:** A stale Docker image was deployed to the test-composer service due to a packaging issue that arose during an internal container registry migration. This caused service pods to crash. ### **How we fixed it:** We identified the stale image and redeployed the correct version, restoring the service. ### **What we are doing to prevent it from happening again:** We are hardening our image deployment pipeline and adding validation checks to ensure container registry migrations do not result in stale or incorrect images being deployed to production.

Read the full incident report →

Notice March 24, 2026

2026-March-24 Resolved Service Incident

Detected by Pingoru
Mar 24, 2026, 05:36 PM UTC
Resolved
Mar 24, 2026, 05:36 PM UTC
Duration
Timeline · 2 updates
  1. resolved Mar 24, 2026, 05:36 PM UTC

    Between 09:32 and 15:13 UTC, we identified a technical issue affecting iOS tests when running with network capture enabled. We've resolved the underlying cause and tests are working as expected. All services are fully operational.

  2. postmortem Apr 10, 2026, 11:06 PM UTC

    ### **Dates:** Tuesday, March 24th 2026, 09:32 UTC – 15:13 UTC ### **What happened:** Network calls failed on iOS devices during Real Device Cloud sessions where network capture was enabled. Approximately 12-13% of iOS sessions were affected. Android was not impacted. ### **Why it happened:** A deployment introduced a DNS resolution change that was incompatible with the iOS platform, causing network capture to break. ### **How we fixed it:** Rolled back the deployment to restore service. ### **What we are doing to prevent it from happening again:** Adding synthetic tests to catch network capture regressions before production, and implementing monitoring alerts for faster detection after deployments.

Read the full incident report →

Major March 19, 2026

2026-March-19 Service Incident

Detected by Pingoru
Mar 19, 2026, 09:51 AM UTC
Resolved
Mar 19, 2026, 10:54 AM UTC
Duration
1h 2m
Affected: US-WestUS-West
Timeline · 3 updates
  1. investigating Mar 19, 2026, 09:51 AM UTC

    Around 4:45 AM UTC we started experiencing lower iOS device availability in the US-West data center. Our team is actively investigating the root cause and working toward a resolution.

  2. resolved Mar 19, 2026, 10:54 AM UTC

    This incident has been resolved and our services are fully operational.

  3. postmortem Apr 10, 2026, 10:59 PM UTC

    ### **Dates:** Wednesday, March 19 2026, 04:45 UTC - 10:47 UTC. ### **What happened:** Approximately 15% of iOS devices in our US-West data center were temporarily unavailable for customer test sessions due to failed internet connectivity checks. ### **Why it happened:** An automated wireless network optimization feature adjusted transmit power levels on access points serving the affected devices, degrading wireless connectivity and causing devices to fail their availability checks. ### **How we fixed it:** The affected access points were identified and restarted, restoring normal wireless connectivity. ### **What we are doing to prevent it from happening again:** Evaluation of the automated optimization tools and a monitoring improvement.

Read the full incident report →

Notice March 18, 2026

2026-March-13 Resolved Service Incident

Detected by Pingoru
Mar 18, 2026, 05:15 PM UTC
Resolved
Mar 13, 2026, 02:30 PM UTC
Duration
Timeline · 2 updates
  1. resolved Mar 18, 2026, 05:15 PM UTC

    Between 14:43 and 15:11 UTC on March 13, a small subset of Real Devices (iOS and Android) became unavailable across all our data centers. After taking remedial action, the issue was identified and resolved. All services are fully operational.

  2. postmortem Apr 10, 2026, 10:57 PM UTC

    ### **Dates:** Friday, March 13th 2026, 14:43 UTC - 15:11 UTC. ### **What happened:** Real Devices \(iOS and Android\) availability gradually decreased across all data centers. ### **Why it happened:** A product defect was introduced resulting in a small subset of Real Devices \(~10%\) failing to maintain required connectivity. ### **How we fixed it:** Rollback to a stable version. ### **What we are doing to prevent it from happening again:** Improve monitoring & alerting, enhance post deployment validation.

Read the full incident report →

Major March 10, 2026

2026-March-10 Service Incident

Detected by Pingoru
Mar 10, 2026, 06:46 PM UTC
Resolved
Mar 10, 2026, 11:34 PM UTC
Duration
4h 48m
Affected: US-WestUS-WestEU-CentralEU-CentralUS-EastUS-East
Timeline · 3 updates
  1. investigating Mar 10, 2026, 06:46 PM UTC

    We are experiencing device unavailability in the US West 1, EU Central 1, and US East 4 data centers and have found that the issue is caused by a 3rd party service disruption. We are investigating.

  2. resolved Mar 10, 2026, 11:34 PM UTC

    This incident has been resolved.

  3. postmortem Apr 09, 2026, 09:07 PM UTC

    ### **Dates:** Tuesday March 10th 2026, 17:52 - 23:34 UTC ### **What happened:** The majority of iOS devices across all regions became unavailable. ### **Why it happened:** Apple's [ppq.apple.com](http://ppq.apple.com) app verification endpoint was down, causing internal device monitoring checks to fail, bringing devices offline. ### **How we fixed it:** We temporarily disabled these device monitoring checks. ### **What we are doing to prevent it from happening again:** Improved external monitoring to catch outages of apple’s [ppq.apple.com](http://ppq.apple.com) endpoint, loosened device monitoring to not take down live iOS devices if [ppq.apple.com](http://ppq.apple.com) is down.

Read the full incident report →

Notice March 6, 2026

2026-March-6 Resolved Service Incident

Detected by Pingoru
Mar 06, 2026, 11:28 PM UTC
Resolved
Mar 06, 2026, 09:38 PM UTC
Duration
Timeline · 2 updates
  1. resolved Mar 06, 2026, 11:28 PM UTC

    Between 21:38 UTC and 23:11 UTC, our virtual iOS and MacOS live and automated device tests were failing to start in the EU Data Center. We executed a deployment rollback, which restored services. All systems are now fully operational.

  2. postmortem Apr 10, 2026, 10:53 PM UTC

    ### **Dates:** Friday, March 6th 2026, 21:38 UTC - 23:11 UTC ### **What happened:** During the incident timeline, customers running virtual iOS simulator tests on ARM or macOS ARM desktop tests in the EU Data Center were unable to start new sessions for either live or automated. ### **Why it happened:** There was a sequencing issue on the release of the ARM side disk images in the EU. ### **How we fixed it:** The image reference for the ARM side disk was rolled back to the previous reference to restore service. ### **What we are doing to prevent it from happening again:** The tests that run to validate the image syncing have been completed in each region.

Read the full incident report →

Minor February 27, 2026

2026-February-27 Service Incident

Detected by Pingoru
Feb 27, 2026, 06:42 PM UTC
Resolved
Feb 27, 2026, 08:15 PM UTC
Duration
1h 33m
Affected: US-West
Timeline · 3 updates
  1. investigating Feb 27, 2026, 06:42 PM UTC

    Live and automated real device test results are not being displayed on the test results page in the US-West-1 data center. We are investigating.

  2. resolved Feb 27, 2026, 08:15 PM UTC

    After taking remedial action, we are now seeing real device test results display in the US-West-1 data center. All services are fully operational.

  3. postmortem Apr 15, 2026, 07:06 PM UTC

    ### **Dates:** Friday, February 27th 2026, 16:15 UTC - 19:50 UTC ### **What happened:** Requests made using API client authentication would return 500 errors. ### **Why it happened:** An internal data structure became corrupted due to a race condition. ### **How we fixed it:** The affected service was restarted, and a long-term fix was applied. ### **What we are doing to prevent it from happening again:** Thread locking has been applied to the affected service.

Read the full incident report →

Major February 23, 2026

2026-February-23 Service Incident

Detected by Pingoru
Feb 23, 2026, 05:22 PM UTC
Resolved
Feb 23, 2026, 10:31 PM UTC
Duration
5h 9m
Affected: US-WestUS-WestUS-WestUS-WestUS-West
Timeline · 4 updates
  1. investigating Feb 23, 2026, 05:22 PM UTC

    We are seeing an increase in automated tests using Sauce Connect failing with “Misconfigured -- No active tunnel found for provided identifier” errors in the US-West-1 data center. We are actively investigating.

  2. investigating Feb 23, 2026, 07:40 PM UTC

    We have identified the root cause and are working on implementing a fix. We are continuing to investigate.

  3. resolved Feb 23, 2026, 10:31 PM UTC

    We have identified the root cause and deployed a fix for this issue. All services are fully operational.

  4. postmortem Apr 15, 2026, 06:58 PM UTC

    ### **Dates:** Monday, February 23rd 2026, 17:22 UTC - 22:31 UTC ### **What happened:** WDIO-based tests run by customers using Sauce Connect 4 could not be started. ### **Why it happened:** Misconfiguration caused failing health checks in some cases, causing customer tunnels to shut down. ### **How we fixed it:** Misconfiguration was corrected. ### **What we are doing to prevent it from happening again:** Evaluation of the underlying software stack. Customers who are still using SC4 that can migrate to SC5 should do so at their earliest convenience.

Read the full incident report →

Notice January 30, 2026

2026 - January - 29 Resolved Service Incident

Detected by Pingoru
Jan 30, 2026, 12:51 AM UTC
Resolved
Jan 29, 2026, 05:30 PM UTC
Duration
Timeline · 2 updates
  1. resolved Jan 30, 2026, 12:51 AM UTC

    Application uploads and automated tests on our virtual and real device platforms in the US West Datacenter experienced intermittent errors today between 5:43 pm (UTC) and 9:45 pm (UTC). These issues were caused by an underlying infrastructure error within our storage services. We took remedial action to stabilize the environment, and the problem has been resolved.

  2. postmortem Mar 06, 2026, 01:44 PM UTC

    ### **Dates:** Thursday, January 29th 2026, 19:30 UTC - 22:25 UTC. ### **What happened:** Application uploads and automated tests in the US-West-1 data center experienced intermittent errors between 5:43 pm \(UTC\) and 9:45 pm \(UTC\) ### **Why it happened:** Due to a high number of concurrent uploads and a temporary change to the network topology schema, the App Storage Service experienced increased connection latency and delays in acquiring connections to the backend. ### **How we fixed it:** App Storage Service network topology schema was restored to its original state. ### **What we are doing to prevent it from happening again:** We are improving monitoring, alerting and validations for the App Storage Service backend connectivity. We are working to implement processes to handle periods of increased load and proactively detect performance degradation to implement automated remediation and self-healing capabilities.

Read the full incident report →

Notice January 29, 2026

2026-January-29 Resolved Service Incident

Detected by Pingoru
Jan 29, 2026, 11:11 AM UTC
Resolved
Jan 29, 2026, 11:11 AM UTC
Duration
Timeline · 2 updates
  1. resolved Jan 29, 2026, 11:11 AM UTC

    Between 8:00 CET - 9:45 CET, We were seeing elevated error rates and intermittent issues with test assets not being retained for tests on iOS and Mac virtual devices in our US-West-1 data center. The issue has now been resolved but some customers might see missing test assets.

  2. postmortem Feb 03, 2026, 02:01 PM UTC

    ### **Dates:** Thursday, January 29th 2026, 07:04 - 08:46 UTC. ### **What happened:** Between 07:04 UTC and 08:46 UTC on January 29, 2026, customers using our US-West-1 region experienced instability with Mac resources. Specifically: * Approximately 20% of iOS simulator jobs failed to start due to infrastructure errors. * 10% of Virtual MacOS/iOS jobs failed to upload assets to S3. As a result, logs, videos, and screenshots for these specific jobs were lost and remain unavailable for download. ### **Why it happened:** Scheduled maintenance on a primary network circuit coincided with high traffic volume, which led to congestion on the backup circuit. ### **How we fixed it:** Full connectivity was restored when traffic was routed back to the primary circuit following the completion of maintenance. ### **What we are doing to prevent it from happening again:** Review network redundancy strategies and consider upgrading backup circuit capacity to ensure it can support peak traffic loads.

Read the full incident report →

Major January 13, 2026

2026-January-13 Service Incident

Detected by Pingoru
Jan 13, 2026, 10:43 AM UTC
Resolved
Jan 13, 2026, 10:50 AM UTC
Duration
7m
Affected: US-WestUS-WestUS-WestUS-WestUS-WestUS-WestUS-West
Timeline · 3 updates
  1. investigating Jan 13, 2026, 10:43 AM UTC

    We are currently experiencing issues with our app storage service in the US-West-1 Data Center. We are investigating the issue.

  2. resolved Jan 13, 2026, 10:50 AM UTC

    After taking remedial action, the app storage service has been restored in the US-West-1 Data Center. All services are fully operational.

  3. postmortem Feb 03, 2026, 01:56 PM UTC

    ### **Dates:** Tuesday, January 13th 2025, 9:10 UTC - 10:37 UTC. ### **What happened:** The App Storage Service experienced high response times in the US-West-1 datacenter. ### **Why it happened:** Due to a high number of concurrent uploads, the App Storage Service experienced increased connection latency and delays in acquiring connections to the backend. ### **How we fixed it:** Service was restored by restarting the application. ### **What we are doing to prevent it from happening again:** We are improving monitoring, alerting and validations for the App Storage Service to backend connectivity. We are also evaluating how to handle periods of increased load and proactively detect performance degradation to implement automated remediation and self-healing capabilities.

Read the full incident report →

Major January 5, 2026

2026-January-5 Service Incident

Detected by Pingoru
Jan 05, 2026, 09:15 PM UTC
Resolved
Jan 05, 2026, 10:24 PM UTC
Duration
1h 8m
Affected: US-WestUS-West
Timeline · 3 updates
  1. monitoring Jan 05, 2026, 09:15 PM UTC

    We have been experiencing device unavailability in our private device cloud in the US data center and have identified the root cause. We have deployed a fix and we are monitoring.

  2. resolved Jan 05, 2026, 10:24 PM UTC

    We were experiencing device unavailability in our private device cloud in the US data center and we have deployed a fix. This issue has been resolved. All services are fully operational.

  3. postmortem Feb 20, 2026, 11:03 AM UTC

    ### **Dates:** Monday, January 5th 2026, 20:40 UTC - 21:04 UTC. ### **What happened:** There was a decrease in Real Device availability in the US-West-1 datacenter. ### **Why it happened:** During a planned traffic flow migration, a network policy applied to the new traffic path was more restrictive than intended and this resulted in a subset of Real Devices \(~40%\) failing to maintain required connectivity. Although the change followed standard change management and peer review processes, the issue was not identified prior to activation. ### **How we fixed it:** The network policy was corrected and affected services recovered. ### **What we are doing to prevent it from happening again:** We are reviewing our network change processes to provide earlier detection of unintended behaviour during planned changes.

Read the full incident report →

Notice December 18, 2025

2025-December-17 Resolved Service Incident

Detected by Pingoru
Dec 18, 2025, 03:52 PM UTC
Resolved
Dec 17, 2025, 10:30 AM UTC
Duration
Timeline · 2 updates
  1. resolved Dec 18, 2025, 03:52 PM UTC

    Between 10:37 UTC December 17th and 11:59 UTC December 18th, we were experiencing issues downloading applications via Mobile App Distribution. This issue has been resolved. All services are fully operational.

  2. postmortem Jan 05, 2026, 04:04 PM UTC

    ### **When it happened:** Wednesday, December 17th 2025, 10:37 UTC - Thursday, December 18th 2025, 11:59 UTC ### **What happened:** Applications were unable to be downloaded via Mobile App Distribution for a subset of customers. ### **Why it happened:** A product change combined with an increased and sustained load resulted in higher resource usage than anticipated, exceeding established limits for certain clients. ### **How we fixed it:** The service was rolled back to a stable version. ### **What we are doing to prevent it from happening again:** There are multiple improvements planned. 1. Improved monitoring & alerting, 2. Enhanced post-deployment validation 3. Capacity planning forecasting 4. More advanced communication where possible ahead of major changes 5. Adding canary to the roll out process

Read the full incident report →

Notice December 18, 2025

2025-December-18 Resolved Service Incident

Detected by Pingoru
Dec 18, 2025, 08:16 AM UTC
Resolved
Dec 18, 2025, 05:30 AM UTC
Duration
Timeline · 2 updates
  1. resolved Dec 18, 2025, 08:16 AM UTC

    Between 5:40 UTC and 7:20 UTC, we experienced a high job error rate of all tests (Manual and Automated, Real Devices and Virtual Devices) in the US-West-1 data center. This issue has been resolved. All services are fully operational.

  2. postmortem Feb 20, 2026, 10:15 AM UTC

    ### **Dates:** Thursday, December 18th 2025, 5:32 UTC - 7:09 UTC. ### **What happened:** The App Storage Service experienced high response times in the US-West-1 datacenter. ### **Why it happened:** Due to a high number of concurrent uploads, the App Storage Service experienced increased connection latency and delays in acquiring connections to the backend. ### **How we fixed it:** The service was restored by restarting the application. ### **What we are doing to prevent it from happening again:** We are implementing improved monitoring, alerting and validations for App Storage Service backend connectivity. We will also look to enhance processes to handle periods of increased load and detect performance degradation proactively, in order to implement automated remediation and self-healing capabilities.

Read the full incident report →

Minor November 4, 2025

2025-November-04 Service Incident

Detected by Pingoru
Nov 04, 2025, 08:02 AM UTC
Resolved
Nov 04, 2025, 11:10 AM UTC
Duration
3h 7m
Affected: EU-CentralEU-Central
Timeline · 3 updates
  1. investigating Nov 04, 2025, 08:02 AM UTC

    We are currently seeing issues accessing the Sauce Home, Insights, Test Results, API in EU-Central-1. We are investigating.

  2. resolved Nov 04, 2025, 11:10 AM UTC

    The access to Sauce Labs Home, Test Results, Insights, API has been restored. All services are fully operational.

  3. postmortem Jan 06, 2026, 02:40 PM UTC

    ### **Dates:** Monday November 4 2025, 04:35 UTC - 08:20 UTC. ### **What happened:** The web UI for Insights and Test Results was intermittently unresponsive. ### **Why it happened:** A data store of components of the UI experienced unexpectedly high load, resulting in an unresponsive service. ### **How we fixed it:** The issue was resolved without human intervention. ### **What we are doing to prevent it from happening again:** We have improved support for the 3rd party vendor's logging and metrics tools.

Read the full incident report →

Minor November 3, 2025

2025-November-3 Resolved Service Incident

Detected by Pingoru
Nov 03, 2025, 04:19 PM UTC
Resolved
Nov 03, 2025, 04:00 PM UTC
Duration
Timeline · 2 updates
  1. resolved Nov 03, 2025, 04:19 PM UTC

    Access to test results was interrupted in our EU Data Center from 12:40 UTC to 14:25 UTC. The issue has been identified and resolved.

  2. postmortem Dec 11, 2025, 05:16 PM UTC

    ### **Dates:** Monday November 3 2025, 12:40 UTC - 14:25 UTC. ### **What happened:** The Sauce Labs dashboard for Insights and Test Results was unresponsive, causing issues showing test results and related information. ### **Why it happened:** A data store for some components of the dashboard experienced unusually high load, resulting in an unresponsive service. ### **How we fixed it:** The issue was resolved by increasing the resources for the affected data store service. ### **What we are doing to prevent it from happening again:** Support for the 3rd party vendor's logging and metrics tools has been improved.

Read the full incident report →

Critical October 29, 2025

2025-October-29 Service Incident

Detected by Pingoru
Oct 29, 2025, 09:27 AM UTC
Resolved
Oct 29, 2025, 10:47 AM UTC
Duration
1h 20m
Affected: US-West
Timeline · 3 updates
  1. investigating Oct 29, 2025, 09:27 AM UTC

    We are currently experiencing issues with live test execution on Real Devices for IOS and Android in the US-West-1 datacenter. We are investigating.

  2. resolved Oct 29, 2025, 10:47 AM UTC

    After taking remedial action, live tests on Real Devices are now running normally in the US-West-1 Data Center. All services are fully operational.

  3. postmortem Dec 11, 2025, 04:51 PM UTC

    ### **Dates:** Monday, October 27th 2025, 15:27 UTC - Wednesday, October 29th 2025, 11:34 UTC. ### **What happened:** Real Device Live Testing availability gradually decreased in the US-West-1 data center. ### **Why it happened:** A product defect was introduced which caused Live Testing sessions on Real Devices to fail to start. ### **How we fixed it:** We performed a rollback to the previous working release. ### **What we are doing to prevent it from happening again:** We have improved monitoring & alerting and enhanced our post-deployment validation processes.

Read the full incident report →

Notice October 28, 2025

2025-October-28 Resolved Service Incident 1

Detected by Pingoru
Oct 28, 2025, 05:20 PM UTC
Resolved
Oct 28, 2025, 11:30 AM UTC
Duration
Timeline · 2 updates
  1. resolved Oct 28, 2025, 05:20 PM UTC

    Between 11:20 UTC and 12:10 UTC, we experienced a high error rate of virtual Android tests in the US-West-1 data center. This issue has been resolved. All services are fully operational.

  2. postmortem Dec 11, 2025, 04:33 PM UTC

    ### **Dates:** Thursday, October 9th 2025, 10:28 UTC - Tuesday, November 11th 2025, 11:34 UTC. ### **What happened:** Virtual Android testing availability gradually decreased in the US-West-1 data center, this led to increased error rates in starting sessions, The error rates remained high between November 4th and November 11th. ### **Why it happened:** An increased and sustained load on our service led to reduced availability of Android virtual VMs, increasing wait times to start sessions, which in turn led to an elevated number of failed sessions. ### **How we fixed it:** This was resolved by increasing infrastructure capacity and optimizing resource utilization. ### **What we are doing to prevent it from happening again:** We have improved monitoring & alerting, enhanced post-deployment validation and improved capacity planning.

Read the full incident report →

Notice October 28, 2025

2025-October-28 Resolved Service Incident

Detected by Pingoru
Oct 28, 2025, 04:13 PM UTC
Resolved
Oct 28, 2025, 04:13 PM UTC
Duration
Timeline · 2 updates
  1. resolved Oct 28, 2025, 04:13 PM UTC

    Between 14:07 UTC and 15:25 UTC, we experienced a high error rate for automated and manual Real Device tests in the US-West, US-East, and EU data centers. This issue has been resolved. All services are fully operational.

  2. postmortem Dec 11, 2025, 04:24 PM UTC

    ### **Dates:** Tuesday, October 28th 2025, 14:03 UTC - 15:18 UTC \(primary incident\) Tuesday, October 28th 2025, 15:18 UTC - Thursday, October 30th 2025, 8:25 UTC \(~1% of devices impacted window\) ### **What happened:** Test executions using the 'stable' Appium version gradually decreased in the US-West-1, US-East-4, and EU-Central-1 data centers. ### **Why it happened:** A product defect was introduced which caused tests using the stable Appium version to fail immediately upon creation. ### **How we fixed it:** We preformed a rollback to a previous version on the majority of devices on Tuesday, October 28th 2025, 15:18 UTC, With a full service restoration across all devices completed on Thursday, October 30th 2025, 8:25 UTC. ### **What we are doing to prevent it from happening again:** We have improved monitoring & alerting and enhanced our post-deployment validation processes.

Read the full incident report →

Major October 28, 2025

2025-October-28 Service Incident

Detected by Pingoru
Oct 28, 2025, 12:53 PM UTC
Resolved
Oct 28, 2025, 02:22 PM UTC
Duration
1h 29m
Affected: EU-CentralEU-Central
Timeline · 3 updates
  1. investigating Oct 28, 2025, 12:53 PM UTC

    We are currently seeing elevated error rates for browser tests running on Windows 8 & 10 in the EU-Central-1, we are currently investigating.

  2. resolved Oct 28, 2025, 02:22 PM UTC

    After taking remedial action, Windows browser tests are now running normally, this incident is resolved.

  3. postmortem Nov 06, 2025, 04:00 PM UTC

    ### **Dates:** Tuesday, October 28th 2025, 11:40 UTC - 14:05 UTC. ### **What happened:** Windows based tests in the EU data center were failing on startup which led to a decrease in availability of Windows-based Virtual Machines. ### **Why it happened:** A product defect was introduced which caused an issue with starting Windows-based VMs. ### **How we fixed it:** The deployment was rolled back to a stable version. ### **What we are doing to prevent it from happening again:** We have improved monitoring & alerting and enhanced our post-deployment validation.

Read the full incident report →

Notice October 7, 2025

2025-October-1 Resolved Service Incident

Detected by Pingoru
Oct 07, 2025, 08:19 AM UTC
Resolved
Oct 01, 2025, 01:00 AM UTC
Duration
Timeline · 2 updates
  1. resolved Oct 07, 2025, 08:19 AM UTC

    Between 13:54 UTC on October 1, 2025, and 10:22 UTC on October 2, 2025, some automated Android Emulator tests failed intermittently in our US West and EU Central data centers. This was caused by a screen resolution change. We fixed the issue and all services are now fully operational.

  2. postmortem Dec 11, 2025, 04:16 PM UTC

    ### **Dates:** Wednesday, October 1st 2025, 10:34 UTC - Thursday, October 2nd 2025, 22:22 UTC ### **What happened:** Virtual Android and Visual tests using the “Android GoogleAPI Emulator” were failing if they contained conditioned that required the exact app dimensions. ### **Why it happened:** A new default emulator was released including a setting that did not account for customer defined virtual screen resolutions. ### **How we fixed it:** The issue was resolved by rolling back the deployment. ### **What we are doing to prevent it from happening again:** A new release was scheduled with an update to accommodate tests with customer defined screen resolution.

Read the full incident report →