Exalate incident

Some Exalate nodes unavailable

Major Resolved View vendor source →

Exalate experienced a major incident on September 24, 2025 affecting Exalate Console and Synchronisation node and 1 more component, lasting 1d 3h. The incident has been resolved; the full update timeline is below.

Started
Sep 24, 2025, 10:56 AM UTC
Resolved
Sep 25, 2025, 02:21 PM UTC
Duration
1d 3h
Detected by Pingoru
Sep 24, 2025, 10:56 AM UTC

Affected components

Exalate ConsoleSynchronisation nodeExalate for Azure DevOpsExalate for ServiceNow in Exalate CloudExalate for GitHubExalate for SalesForceFreshdeskFreshservice

Update timeline

  1. investigating Sep 24, 2025, 10:56 AM UTC

    There is an outage on one of the clusters in Exalate cloud that might affect access to the nodes temporarily. We are currently investigating the issue to ensure that service is restored as soon as possible.

  2. identified Sep 24, 2025, 11:18 AM UTC

    We have root caused the issue and the restoration process is underway already. The next update will be provided in 30mins.

  3. identified Sep 24, 2025, 11:47 AM UTC

    The recovery process is still ongoing. We will provide an update within an hour.

  4. monitoring Sep 24, 2025, 01:00 PM UTC

    All nodes are fully recovered. We continue to closely monitor the situation.

  5. identified Sep 24, 2025, 02:50 PM UTC

    Our monitoring uncovered a problem still lingering with the cluster. We are working to restore full functionality as soon as possible.

  6. identified Sep 24, 2025, 04:56 PM UTC

    Node recovery has presented some challenges and is taking longer than expected. We continue to strive to bring all nodes back online as soon as possible. Next update will be provided within 2 hours.

  7. identified Sep 24, 2025, 06:47 PM UTC

    Infrastructure component failures have been addressed by our Engineering team. Next step: We are restarting the individual Exalate nodes impacted by the failure. More information will be provided in an hour.

  8. identified Sep 24, 2025, 07:54 PM UTC

    We continue restarting the individual Exalate nodes impacted by the failure. 5% of the affected nodes have been brought back online. Further updates to be expected in an hour

  9. identified Sep 24, 2025, 08:51 PM UTC

    The team is actively working through the restart sequence for all individual Exalate nodes. This is a deliberate, multi-step process to ensure stability upon full restoration. Further updates to be expected in an hour.

  10. identified Sep 24, 2025, 09:45 PM UTC

    Following the deliberate, multi-step sequence, the engineering team has now moved into the main restoration phase. We are actively scaling up the restart of the affected Exalate nodes. We'll continue to monitor stability closely. We will provide the next update in one hour.

  11. identified Sep 24, 2025, 10:49 PM UTC

    The engineering team continue working on the main restoration phase. We'll continue to monitor stability closely. Further updates to be expected in an hour

  12. identified Sep 24, 2025, 11:50 PM UTC

    During the scale-up phase, the restoration process has presented some challenges that are taking longer than initially expected to resolve. The engineering team is actively working to address these stability issues. We will provide the next update within one hour.

  13. identified Sep 25, 2025, 12:48 AM UTC

    The previous technical challenges have been addressed. We have re-initiated the scale-up of the Exalate nodes and are closely monitoring the environment for stability. We will provide the next update within one hour.

  14. identified Sep 25, 2025, 01:44 AM UTC

    We are continuing the scale-up of the Exalate nodes. The team is maintaining a close monitoring posture to ensure stability throughout this process. We will provide the next update in one hour.

  15. monitoring Sep 25, 2025, 03:23 AM UTC

    All nodes are fully restored and functioning normally now. We continue to monitor the cluster carefully to ensure stability.

  16. resolved Sep 25, 2025, 02:21 PM UTC

    We have extensively monitored the cluster health and there have been no outstanding issues found.

  17. postmortem Oct 10, 2025, 10:41 AM UTC

    ## Executive Summary On September 24, 2025, Exalate experienced a service interruption lasting 17 hours and 28 minutes that affected our cloud-hosted integration nodes. During this time, customers were unable to synchronize data between their integrated systems. **No customer data was lost.** We sincerely apologize for the inconvenience and want to share what happened and how we're preventing future issues. ## What Happened **Timeline:** * **10:02 UTC \(Sept 24\):** Infrastructure issue detected through customer reports * **13:00 UTC:** Partial service restoration achieved * **13:45 UTC:** Secondary technical issue caused complete service unavailability * **17:24 UTC:** Core infrastructure restored * **21:00 UTC:** Priority customer services online * **03:30 UTC \(Sept 25\):** Full service restoration completed **Root Cause:** A network connectivity issue on our hosting platform triggered a cascading infrastructure failure. The recovery process was complex due to database resilience challenges and infrastructure management system complications. ## Customer Impact **During the outage:** * Data synchronization between systems \(Jira, ServiceNow, etc.\) was unavailable * Automated workflow processes were temporarily halted **What was NOT affected:** * **No customer data was lost or corrupted** * All existing synchronized data remained intact * Customer configurations and sync histories were preserved ## Our Response We immediately activated our 24/7 incident response team, maintained continuous status page updates, directly contacted Enterprise customers, and coordinated with infrastructure providers throughout the recovery. ## Prevention Measures We're implementing comprehensive improvements on an accelerated timeline: **Immediate \(October 2025\):** * Enhanced infrastructure monitoring and alerting systems * Comprehensive disaster recovery documentation **Short-term \(November 2025\):** * Infrastructure resilience improvements * Automated recovery procedures * Regular disaster recovery testing **Medium-term \(Q1 2026\):** * Multi-cloud architecture implementation * Advanced predictive monitoring ## Customer Support If you need assistance related to this outage: * **Enterprise Customers:** Use your dedicated support channels * **Standard Support:** Submit tickets through our support portal * **Status Updates:** Monitor our status page for ongoing information We apologize for this service disruption and appreciate your patience. Your trust is essential to our business, and we're committed to earning it through reliable service delivery and continuous improvement.