Coalesce Outage History

Coalesce is up right now

There were 5 Coalesce outages since February 26, 2026 totaling 375h 43m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.coalesce.io

Minor April 30, 2026

Transform North America: Job Execution Service Interruption

Detected by Pingoru
Apr 30, 2026, 09:15 PM UTC
Resolved
May 01, 2026, 02:23 PM UTC
Duration
17h 8m
Affected: us-central1
Timeline · 3 updates
  1. identified Apr 30, 2026, 09:15 PM UTC

    Coalesce Transform for customers in our North America region experienced a service degradation starting on Wednesday April 29th with the release of 7.33 that is affecting job execution. - Root cause: A change in 7.33 made our node selector matching case-insensitive, which in a rare circumstance, made it possible for newly created jobs enter a recursive job-launch loop. - Current state: We have applied a concurrency limit that prevents runaway job launches and are working on a hotfix to resolve the underlying issue. - Fix inbound ASAP: Hotfix reverting selector matching to case-sensitive is approved and in our release pipeline now. We will update this incident once resolved.

  2. monitoring Apr 30, 2026, 09:45 PM UTC

    The hotfix has been delivered in version 7.33.2

  3. resolved May 01, 2026, 02:23 PM UTC

    The incident has been resolved.

Read the full incident report →

Major April 23, 2026

Job Scheduler

Detected by Pingoru
Apr 23, 2026, 04:37 PM UTC
Resolved
Apr 23, 2026, 08:39 PM UTC
Duration
4h 2m
Affected: Scheduler
Timeline · 3 updates
  1. investigating Apr 23, 2026, 04:37 PM UTC

    We are investigating the impact of a current Github outage that is effecting our Job Scheduler's ability to execute scheduled jobs.

  2. monitoring Apr 23, 2026, 05:31 PM UTC

    The Github outage is recovering and Coalesce scheduled jobs have resumed processing.

  3. resolved Apr 23, 2026, 08:39 PM UTC

    This incident is resolved

Read the full incident report →

Minor April 2, 2026

Notification service outage

Detected by Pingoru
Apr 02, 2026, 01:16 AM UTC
Resolved
Apr 02, 2026, 04:49 AM UTC
Duration
3h 32m
Affected: Scheduler
Timeline · 3 updates
  1. identified Apr 02, 2026, 01:16 AM UTC

    We are resolving an issue with our Notifications service that sends job update email notifications. The service was inadvertently taken offline due to an infrastructure configuration change. Jobs are continuing to be processed and should not be impacted by this outage. We estimate the service should be restored in 30 minutes or less and will update this incident with more details.

  2. monitoring Apr 02, 2026, 01:26 AM UTC

    The service is online and we are monitoring for any issues.

  3. resolved Apr 02, 2026, 04:49 AM UTC

    Incident Summary: Notifications Service Outage Date: April 1, 2026 Duration: approximately 8 hours Severity: Email notifications only - no impact to job processing What happened On April 1st, 2026, our Notifications service that is responsible for sending job update email notifications went offline. The service was unable to start due to a configuration change that removed the service's image from our image registry. Jobs continued to run and complete normally throughout the incident. No data was lost. The impact was the loss of email notifications for job status updates during the outage. Root cause As part of a planned infrastructure cost optimization effort, we activated a storage cleanup policy on our container image registry to remove old, unused images. The Notifications service uses an independent release cycle from our core platform and had not been rebuilt recently. Its deployed image version fell outside the retention window and was removed by the cleanup policy. When the service attempted to restart, it could not pull the required image. Resolution We identified the issue, deployed the latest version of the Notifications service, and confirmed full functionality was restored. Steps taken to prevent recurrence 1. Added additional image version retention policy, ensuring that infrequently built services are never pruned while in service. 2. Upgraded monitoring and alerting - we are adding additional monitors and alerts that will page our On Call Engineering team to ensure faster response times. 3. Audit of all deployed services - we have verified that all currently deployed image versions across all environments are present in the registry and covered by the updated retention policy. We apologize for the inconvenience and are committed to ensuring this does not happen again. If you have any questions, please reach out to our support team.

Read the full incident report →

Critical March 9, 2026

Catalog features unavailable

Detected by Pingoru
Mar 09, 2026, 12:15 AM UTC
Resolved
Mar 09, 2026, 09:00 AM UTC
Duration
8h 45m
Affected: Web App
Timeline · 3 updates
  1. investigating Mar 09, 2026, 12:15 AM UTC

    We are currently investigating an incident with Catalog functionality.

  2. identified Mar 09, 2026, 02:59 AM UTC

    We have identified the cause of this outage and are working towards a resolution.

  3. resolved Mar 09, 2026, 09:00 AM UTC

    This incident has been resolved. A report will be published shortly.

Read the full incident report →

Major February 26, 2026

Intermittent Job Timeouts

Detected by Pingoru
Feb 26, 2026, 04:43 PM UTC
Resolved
Mar 12, 2026, 10:58 PM UTC
Duration
14d 6h
Affected: Scheduler
Timeline · 18 updates
  1. investigating Feb 25, 2026, 11:21 AM UTC

    We are investigating reports of jobs intermittently timing out. Our team is actively working to identify the root cause and will provide updates as soon as more information becomes available.

  2. identified Feb 25, 2026, 01:13 PM UTC

    We have identified a bottleneck in our job scheduling queue in the US region. We are currently recovering our infrastructure and monitoring the backlog of stale jobs. Customers may continue to see some older jobs fail, but new jobs are starting to process. We will provide further updates as the queue returns to normal levels.

  3. monitoring Feb 25, 2026, 02:07 PM UTC

    A fix has been implemented and we are monitoring the results.

  4. identified Feb 25, 2026, 04:10 PM UTC

    Job scheduling issues in the US region have recurred. We have identified that certain jobs are becoming unresponsive and blocking the rest of the queue. We are intervening to clear the backlog and are investigating the underlying cause of these stalled processes. Customers should expect intermittent delays or timeouts in the interim.

  5. identified Feb 25, 2026, 05:15 PM UTC

    We are continuing to investigate. We will post updates at the top of each hour, or sooner if new information becomes available.

  6. monitoring Feb 25, 2026, 05:37 PM UTC

    We are no longer observing job timeouts at this time. Our team continues to monitor the system closely to ensure stability. We will provide further updates if anything changes.

  7. monitoring Feb 26, 2026, 12:06 AM UTC

    Jobs are no longer timing out and the Scheduler is back to operational. We are going to continue to monitor over the next 12-24 hours and provide an update tomorrow with more details on the cause of the service interruption.

  8. monitoring Feb 26, 2026, 04:43 PM UTC

    Between 2-5 AM Pacific Time US we discovered jobs were again backing up and not processing as expected. Engineering has developed a patch to resolve this issue we are targeting to be released by 12 PM Pacific. We will continue to monitor and resolve any issues with job processing and update this incident when the patch has been shipped.

  9. monitoring Feb 26, 2026, 08:53 PM UTC

    We have released a new version 7.29.4 that mitigates the performance issues our customers are experiencing. We expect it to reduce customer impact from this issue. Any customers using the coa CLI are recommended to upgrade. We will continue to update this issue as we make progress towards the root cause.

  10. monitoring Feb 26, 2026, 11:39 PM UTC

    Clients can expect to continue to experience issues with delays and timeouts. We have an additional reliability optimization we have developed that is targeted for release within the next 2 hours or less. Expect our next update here when available.

  11. monitoring Feb 27, 2026, 01:06 AM UTC

    Clients may still see issues with Deploy and Refresh operations which we are continuing to monitor. We have released an additional update, version 7.29.5, that includes follow-on improvements to the reliability of our Refresh and Deploy operations. This update is included in version 7.29.5 of our coa CLI and is a recommended upgrade to all customers. We are continuing to monitor and provide updates as this incident progresses.

  12. monitoring Feb 27, 2026, 05:48 PM UTC

    The update we released yesterday has resolved the ongoing Deploy and Refresh failures from our clients. We are continuing to monitor and will provide details on the root cause of the issue as soon as available. A small number of our clients are seeing delays running Deploy operations that are unrelated to the ongoing issue this week. We have identified a likely root cause and if confirmed and resolved will transition this issue to Operational.

  13. monitoring Feb 28, 2026, 12:09 AM UTC

    We are no longer seeing general or widespread issues with our operations and will continue to monitor this incident over the weekend as we work towards a final solution and root cause analysis.

  14. monitoring Mar 03, 2026, 11:54 AM UTC

    Clients are reporting errors running refresh and deploy operations. We are investigating and will provide updates.

  15. monitoring Mar 03, 2026, 03:07 PM UTC

    We identified a resource that needed to be restarted and the DEADLINE_EXCEEDED errors clients were seeing should cease. This was unrelated to the overall incident we are tracking here and work will continue to resolving fully.

  16. monitoring Mar 04, 2026, 09:47 PM UTC

    Starting at approximately 10PM Pacific time Tuesday March 3rd a single resource in our Scheduler API resource pool degraded in performance and did not recover. This caused a performance impact across the product due to the reliance on the Scheduler as our core API service. At 9AM Pacific time Wednesday March 4th, this resource was restarted and operations returned to normal. We are continuing to root cause the reason we experienced a similar scenario on the AM of March 3rd. We have added additional monitoring and alerting for this specific behavior to reduce any impact if it re-occurs until we deliver a full resolution. Updates to follow with a full timeline and root cause of this entire incident.

  17. monitoring Mar 06, 2026, 03:28 AM UTC

    Incident Update — March 5, 2026 On the afternoon of March 4th, we confirmed the core root cause of this incident. We have been monitoring our infrastructure continuously since then and are not seeing ongoing impact to our clients. Root Cause: This incident was caused by a series of compounding issues, but the primary root cause is a performance flaw in a third-party SQL parsing library used throughout our platform. We identified the problematic behavior early in our investigation, but because SQL parsing is a centralized function critical to Deploy, Refresh, and API operations, we could not simply disable it. A fix required coordination with the third-party vendor, who delivered us a resolution this morning. What We Did While Awaiting the Vendor Fix: Rather than wait, we invested heavily in hardening the platform against the impact of this flaw: - Reliability improvements: Added timeout and retry mechanisms to prevent jobs from hanging indefinitely on unresponsive connections - Resource isolation: Separated heavy SQL parsing workloads onto dedicated infrastructure so they no longer block critical scheduler operations - Performance fixes: Identified and resolved a code-level regression that was amplifying the parsing cost per request - Monitoring & alerting: Deployed new observability tooling and health checks to detect and automatically remediate the application when a degraded state is detected Today's Release: Today we released version 7.30.6, which includes additional reliability and performance improvements that further reduce the impact of the underlying parsing flaw. We will continue to monitor closely and will ship the vendor's fix as soon as it is available. Next Steps: - We are preparing a detailed Incident Report that will be available upon request, including a full timeline and additional details. - We are testing an updated fix from our vendor and expect to release an update early next week, pending our quality checks. We appreciate your patience and collaboration during this ongoing impact to your operations.

  18. resolved Mar 12, 2026, 10:58 PM UTC

    We have released version 7.30.7 that includes the performance improvement from the third-party library we outlined in the prior update. All systems are operating as expected, please reach out to support if you need any assistance.

Read the full incident report →

Looking to track Coalesce downtime and outages?

Pingoru polls Coalesce's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when Coalesce reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track Coalesce alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring Coalesce for free

5 free monitors · No credit card required