Blacksmith Outage History

Blacksmith had 71 outages in the last 2 years totaling 58h 51m of downtime — averaging 2.9 incidents per month.

There were 71 Blacksmith outages since August 6, 2025 totaling 58h 51m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.blacksmith.sh

Minor August 2, 2026

US-West storage cluster degraded

Detected by Pingoru: Aug 02, 2026, 06:16 AM UTC
Resolved: Aug 02, 2026, 06:37 AM UTC
Duration: 20m

Affected: Incremental Docker Builders (us-west Storage Cluster)Docker Container Cache (us-west Storage Cluster)Stickydisks (us-west Storage Cluster)

Timeline · 3 updates

investigating Aug 02, 2026, 06:16 AM UTC

We are seeing some failures with sticky disk availability and write latency with the US-West storage cluster.
monitoring Aug 02, 2026, 06:25 AM UTC

We've applied a fix and are seeing failure rates and latency starting to come down.
resolved Aug 02, 2026, 06:37 AM UTC

This incident has been resolved.

Read the full incident report →

Major July 29, 2026

Increased Error Rate on Codesmith Sandbox Startup

Detected by Pingoru: Jul 29, 2026, 04:05 PM UTC
Resolved: Jul 29, 2026, 04:31 PM UTC
Duration: 26m

Affected: Codesmith

Timeline · 3 updates

identified Jul 29, 2026, 04:05 PM UTC

We're seeing increased error rates from an upstream provider in spinning up sandbox environments in Codesmith. We are looking into remediation options.
monitoring Jul 29, 2026, 04:08 PM UTC

The upstream issue is resolved, we are monitoring the sandbox creation rate recovering.
resolved Jul 29, 2026, 04:31 PM UTC

This incident has been resolved.

Read the full incident report →

Minor July 24, 2026

Jobs not being adopted in all regions

Detected by Pingoru: Jul 24, 2026, 07:17 PM UTC
Resolved: Jul 24, 2026, 07:51 PM UTC
Duration: 33m

Timeline · 4 updates

investigating Jul 24, 2026, 07:17 PM UTC

We are seeing webhooks not being accepted by our control plane resulting in jobs not being adopted. We are currently investigating.
monitoring Jul 24, 2026, 07:20 PM UTC

We have deployed a fix and are monitoring recovery, and will provide another update within the next 30 minutes.
monitoring Jul 24, 2026, 07:23 PM UTC

We have requeued the webhooks that were not processed during the incident, and any affected jobs should start shortly.
resolved Jul 24, 2026, 07:51 PM UTC

This incident has been resolved, and job pickup has returned to normal.

Read the full incident report →

Minor July 23, 2026

Sticky disk degradation in EU-West

Detected by Pingoru: Jul 23, 2026, 03:00 PM UTC
Resolved: Jul 23, 2026, 03:44 PM UTC
Duration: 43m

Affected: Stickydisks (eu-west Storage Cluster)Docker Container Cache (eu-west Storage Cluster)

Timeline · 3 updates

investigating Jul 23, 2026, 03:00 PM UTC

We're seeing large volumes of traffic in our storage cluster in EU-West and this is causing a number of sticky disk related requests to time out and are actively investigating.
monitoring Jul 23, 2026, 03:18 PM UTC

We implemented a fix and are currently monitoring the result.
resolved Jul 23, 2026, 03:44 PM UTC

This incident has been resolved.

Read the full incident report →

Minor July 22, 2026

Extended Queueing in Action Runners

Detected by Pingoru: Jul 22, 2026, 05:45 PM UTC
Resolved: Jul 22, 2026, 06:05 PM UTC
Duration: 20m

Affected: Blacksmith Managed Runners (us-west ARM)Blacksmith Managed Runners (us-west x86)Blacksmith Managed Runners (eu-west x86)

Timeline · 2 updates

investigating Jul 22, 2026, 05:45 PM UTC

We are currently experiencing elevated tail latencies for certain customers due to job prioritization. We are noticing jobs are taking upwards of 10m to adopt in a few cases. We are looking at options to alleviate queueing.
resolved Jul 22, 2026, 06:05 PM UTC

We have implemented some mitigations and queue times are back to normal. The incident has been resolved.

Read the full incident report →

Minor July 22, 2026

Degredated actions cache performance

Detected by Pingoru: Jul 22, 2026, 01:00 PM UTC
Resolved: Jul 22, 2026, 02:14 PM UTC
Duration: 1h 14m

Affected: Actions Cache

Timeline · 3 updates

monitoring Jul 22, 2026, 01:00 PM UTC

We identified an issue which was leading to actions/cache requests experiencing an elevated failure rate. This has now been mitigated and we are monitoring.
monitoring Jul 22, 2026, 01:18 PM UTC

We implemented a fix and are currently monitoring the result.
resolved Jul 22, 2026, 02:14 PM UTC

This incident has been resolved.

Read the full incident report →

Major July 21, 2026

Delays in job adoption

Detected by Pingoru: Jul 21, 2026, 02:25 PM UTC
Resolved: Jul 21, 2026, 11:31 PM UTC
Duration: 9h 5m

Affected: Blacksmith Managed Runners (eu-central ARM)Blacksmith Managed Runners (eu-central x86)Blacksmith Managed Runners (us-west ARM)Blacksmith Managed Runners (us-west x86)Blacksmith Managed Runners (eu-west x86)Actions CacheBlacksmith Managed Runners (us-central MacOS)Incremental Docker Builders (eu-central Storage Cluster)Incremental Docker Builders (us-west Storage Cluster)Docker Container Cache (eu-central Storage Cluster)Docker Container Cache (us-west Storage Cluster)Stickydisks (eu-central Storage Cluster)Stickydisks (us-west Storage Cluster)Stickydisks (eu-west Storage Cluster)Incremental Docker Builders (eu-west Storage Cluster)Docker Container Cache (eu-west Storage Cluster)

Timeline · 12 updates

Read the full incident report →

Major July 19, 2026

GitHub Outage Affecting Jobs

Detected by Pingoru: Jul 19, 2026, 11:38 PM UTC
Resolved: Jul 20, 2026, 03:54 AM UTC
Duration: 4h 15m

Affected: Github → Actions

Timeline · 6 updates

identified Jul 19, 2026, 11:38 PM UTC

GitHub has opened an incident which we are seeing is affecting jobs getting picked up.
identified Jul 20, 2026, 12:28 AM UTC

We are continuing to see errors from GitHub when trying to provision CI jobs. They are continuing to investigate their incident.
identified Jul 20, 2026, 02:21 AM UTC

We are starting to see signs of recovery of upstream API errors and jobs have resumed running. We are continuing to monitor.
identified Jul 20, 2026, 03:08 AM UTC

Requests to GitHub API are now succeeding. There maybe some some delays in job adoption as we work through the large backlog of jobs that did not run during the incident.
monitoring Jul 20, 2026, 03:35 AM UTC

Job executions have returned to normal. We are monitoring the recovery.
resolved Jul 20, 2026, 03:54 AM UTC

This incident has been resolved.

Read the full incident report →

Minor July 16, 2026

Intermittent dashboard loading and CI job errors due to GitHub outage

Detected by Pingoru: Jul 16, 2026, 10:34 PM UTC
Resolved: Jul 17, 2026, 12:03 AM UTC
Duration: 1h 29m

Affected: Website

Timeline · 4 updates

investigating Jul 16, 2026, 10:34 PM UTC

We are currently investigating this incident.
identified Jul 16, 2026, 10:36 PM UTC

We are seeing errors affecting the availability to log into the Blacksmith dashboard due to upstream calls to GitHub. We are not currently seeing any impact to the execution of CI jobs.
monitoring Jul 16, 2026, 10:55 PM UTC

We are seeing errors affecting the availability to log into the Blacksmith dashboard due to upstream calls to GitHub. There are widespread reports of GitHub API failures: We are seeing this impact CI jobs that interact with GitHub APIs such as downloading actions, uploading artifacts, etc.
resolved Jul 17, 2026, 12:03 AM UTC

The upstream GitHub incident has been resolved: We will continue to monitor GitHub API Stability.

Read the full incident report →

Minor July 9, 2026

Sticky disk attach failures in us-west

Detected by Pingoru: Jul 09, 2026, 09:09 PM UTC
Resolved: Jul 09, 2026, 09:21 PM UTC
Duration: 12m

Affected: Stickydisks (us-west Storage Cluster)

Timeline · 2 updates

investigating Jul 09, 2026, 09:09 PM UTC

We are currently experiencing an issue in our us-west region where sticky disks are failing to attach. Sticky disks back several of our caching features, so customers running jobs in us-west may be affected across Docker container caching, Docker build caching, and Git repository caching. Affected jobs will still run, but will fall back to uncached behavior and may take longer than usual. Runners in other regions are unaffected. We are actively investigating and will post updates as we make progress.
resolved Jul 09, 2026, 09:21 PM UTC

This incident is resolved. The issue was identified and fixed. Sticky disk attachments in us-west were affected between 20:40 and 21:15 UTC.

Read the full incident report →

Minor July 7, 2026

Github Actions and Codespaces APIs experiencing partial failures

Detected by Pingoru: Jul 07, 2026, 02:36 PM UTC
Resolved: Jul 07, 2026, 04:31 PM UTC
Duration: 1h 55m

Affected: Github → ActionsGithub → API Requests

Timeline · 2 updates

investigating Jul 07, 2026, 02:36 PM UTC

Github is investigating reports of degraded performance for Actions and Codespaces.
resolved Jul 07, 2026, 04:31 PM UTC

This incident has been resolved.

Read the full incident report →

Minor July 6, 2026

Delays in job adoption in us-west

Detected by Pingoru: Jul 06, 2026, 08:25 PM UTC
Resolved: Jul 06, 2026, 10:18 PM UTC
Duration: 1h 53m

Affected: Blacksmith Managed Runners (us-west x86)Incremental Docker Builders (us-west Storage Cluster)Docker Container Cache (us-west Storage Cluster)

Timeline · 2 updates

investigating Jul 06, 2026, 08:25 PM UTC

We're seeing a spike of jobs in the us-west region leading to a temporary capacity shortage. We're looking into re-balancing to reduce the delaysWe are currently investigating this incident.
resolved Jul 06, 2026, 10:18 PM UTC

This incident has been resolved.

Read the full incident report →

Minor July 6, 2026

Actions Cache degradation in us-west

Detected by Pingoru: Jul 06, 2026, 03:23 PM UTC
Resolved: Jul 06, 2026, 09:55 PM UTC
Duration: 6h 31m

Affected: Actions CacheIncremental Docker Builders (us-west Storage Cluster)Docker Container Cache (us-west Storage Cluster)

Timeline · 5 updates

investigating Jul 06, 2026, 03:23 PM UTC

We are currently investigating this incident.
investigating Jul 06, 2026, 03:40 PM UTC

We are continuing to investigate a network issue with a storage server in our US West region since around 14:50 UTC on July 6\. Customers may experience slowness in US West. We are working with our upstream network provider on a fix and will post updates here.
identified Jul 06, 2026, 07:21 PM UTC

We are running a maintenance operation on our cache storage cluster to resolve the degradation. For roughly 15 to 20 minutes, caching will fall back to the GitHub Actions cache and may be slower; jobs will otherwise run normally. We will update once complete.
monitoring Jul 06, 2026, 09:08 PM UTC

We implemented a fix and are currently monitoring the result.
resolved Jul 06, 2026, 09:55 PM UTC

This incident has been resolved.

Read the full incident report →

Minor July 2, 2026

Missing job history in the Blacksmith dashboard for runs before 3:30pm EDT today

Detected by Pingoru: Jul 02, 2026, 09:03 PM UTC
Resolved: Jul 03, 2026, 11:01 AM UTC
Duration: 13h 57m

Affected: API

Timeline · 3 updates

identified Jul 02, 2026, 09:03 PM UTC

All job history rows from before 3:30pm EDT today were lost. Data up is being restored from a backup.
monitoring Jul 03, 2026, 12:41 AM UTC

We are in the process of backfilling the data and customers should begin to start seeing data from 12am and earlier on their observability pages. We are still working on a resolution for today's gap (12AM - 3:30PM) and will update as soon as that fix is put in place.
resolved Jul 03, 2026, 11:01 AM UTC

This incident has been resolved. We will continue to backfill certain impacted time periods but observability data for customers is now accurately represented in the workflow and job dashboards.

Read the full incident report →

Major July 2, 2026

Actions Cache degradation

Detected by Pingoru: Jul 02, 2026, 01:17 PM UTC
Resolved: Jul 02, 2026, 04:16 PM UTC
Duration: 2h 58m

Affected: Actions Cache

Timeline · 3 updates

investigating Jul 02, 2026, 01:17 PM UTC

We are seeing some instances of actions cache degradation post incident are are actively investigating.
monitoring Jul 02, 2026, 02:49 PM UTC

We implemented a fix and are currently monitoring the result.
resolved Jul 02, 2026, 04:16 PM UTC

This incident has been resolved.

Read the full incident report →

Major July 2, 2026

Actions Cache outage

Detected by Pingoru: Jul 02, 2026, 02:52 AM UTC
Resolved: Jul 02, 2026, 04:03 AM UTC
Duration: 1h 11m

Affected: Actions Cache

Timeline · 3 updates

investigating Jul 02, 2026, 02:52 AM UTC

We are currently investigating this incident.
monitoring Jul 02, 2026, 03:38 AM UTC

We implemented a fix and are currently monitoring the result.
resolved Jul 02, 2026, 04:03 AM UTC

This incident has been resolved.

Read the full incident report →

Major June 30, 2026

Runners hitting Github rate limiting

Detected by Pingoru: Jun 30, 2026, 05:36 PM UTC
Resolved: Jun 30, 2026, 06:47 PM UTC
Duration: 1h 11m

Timeline · 4 updates

investigating Jun 30, 2026, 05:36 PM UTC

We are currently investigating an issue where customers Github Actions are not being picked up by Runners across our regions.
identified Jun 30, 2026, 06:28 PM UTC

We are have mitigated the issue on our end, but customers may experience some queueing until we catch up.
monitoring Jun 30, 2026, 06:35 PM UTC

Queueing improving and currently monitoring.
resolved Jun 30, 2026, 06:47 PM UTC

This incident has been resolved.

Read the full incident report →

Minor June 27, 2026

EU Central Storage Cluster

Detected by Pingoru: Jun 27, 2026, 09:21 PM UTC
Resolved: Jun 27, 2026, 10:16 PM UTC
Duration: 54m

Affected: Incremental Docker Builders (eu-central Storage Cluster)Docker Container Cache (eu-central Storage Cluster)Stickydisks (eu-central Storage Cluster)

Timeline · 3 updates

investigating Jun 27, 2026, 09:21 PM UTC

We are currently investigating some service degradation with the eu-central storage cluster. You may see cache misses and slower disk speeds.
monitoring Jun 27, 2026, 09:51 PM UTC

We implemented a fix and are currently monitoring the result.
resolved Jun 27, 2026, 10:16 PM UTC

This incident has been resolved.

Read the full incident report →

Minor June 26, 2026

Delays in MacOS job adoption

Detected by Pingoru: Jun 26, 2026, 12:21 PM UTC
Resolved: Jun 26, 2026, 01:15 PM UTC
Duration: 53m

Affected: Blacksmith Managed Runners (us-central MacOS)

Timeline · 3 updates

investigating Jun 26, 2026, 12:21 PM UTC

We're seeing a spike of MacOS jobs leading to a temporary capacity shortage. We're looking into re-balancing to reduce the delays
monitoring Jun 26, 2026, 12:50 PM UTC

Our team have mitigated this and are now seeing job pick up times return to normal as we monitor
resolved Jun 26, 2026, 01:15 PM UTC

This incident has been resolved.

Read the full incident report →

Minor June 25, 2026

Github are reporting degraded performance for Actions and Webhooks

Detected by Pingoru: Jun 25, 2026, 06:29 PM UTC
Resolved: Jun 25, 2026, 06:36 PM UTC
Duration: 7m

Affected: Blacksmith Managed Runners (eu-central x86)Blacksmith Managed Runners (us-west x86)Blacksmith Managed Runners (eu-west x86)

Timeline · 2 updates

identified Jun 25, 2026, 06:29 PM UTC

Github has reported degraded performance for Actions, Pull Requests and Webhooks. Job may take a moment to be adopted. We are monitoring this incident.
resolved Jun 25, 2026, 06:36 PM UTC

This incident has been resolved.

Read the full incident report →

Major June 17, 2026

Job adoption delays

Detected by Pingoru: Jun 17, 2026, 12:45 PM UTC
Resolved: Jun 17, 2026, 01:48 PM UTC
Duration: 1h 3m

Timeline · 6 updates

investigating Jun 17, 2026, 12:45 PM UTC

We are receiving reports of job adoption delays, we are currently investigating.
investigating Jun 17, 2026, 01:01 PM UTC

We believe this is related to an issue with Github Webhooks and are still investigating.
identified Jun 17, 2026, 01:14 PM UTC

We have identified the issue as GitHub sending us a high cardinality of malformed webhooks missing critical pieces of information in the payloads. We are working on a patch to work around this as we wait for GitHub to fix the upstream issue.
monitoring Jun 17, 2026, 01:27 PM UTC

Our engineers are implementing a mitigation which should take 20 minutes to deploy. Thank you for your patience.
monitoring Jun 17, 2026, 01:48 PM UTC

We implemented a fix and are currently monitoring the result but seeing recovery. Any existing queued jobs wont self recover so please be sure to cancel and rerun those jobs.
resolved Jun 17, 2026, 01:48 PM UTC

This incident has been resolved.

Read the full incident report →

Major June 15, 2026

Github are reporting degraded performance for Webhooks

Detected by Pingoru: Jun 15, 2026, 03:45 PM UTC
Resolved: Jun 15, 2026, 04:44 PM UTC
Duration: 59m

Affected: Github → Webhooks

Timeline · 2 updates

monitoring Jun 15, 2026, 03:45 PM UTC

GitHub are reporting degraded performance with Webhooks. This may have an impact on job adoption. We are monitoring this incident -
resolved Jun 15, 2026, 04:44 PM UTC

Github have reported this incident as resolved.

Read the full incident report →

Minor June 11, 2026

Currently investigating our degraded us-west storage cluster

Detected by Pingoru: Jun 11, 2026, 08:54 PM UTC
Resolved: Jun 11, 2026, 11:02 PM UTC
Duration: 2h 8m

Affected: Incremental Docker Builders (us-west Storage Cluster)Docker Container Cache (us-west Storage Cluster)Stickydisks (us-west Storage Cluster)

Timeline · 2 updates

investigating Jun 11, 2026, 08:54 PM UTC

We are currently investigating this incident.
resolved Jun 11, 2026, 11:02 PM UTC

Speeds have returned to normal and this incident has been resolved.

Read the full incident report →

Major June 11, 2026

Delays In Job Adoption

Detected by Pingoru: Jun 11, 2026, 04:35 PM UTC
Resolved: Jun 11, 2026, 07:38 PM UTC
Duration: 3h 2m

Timeline · 4 updates

investigating Jun 11, 2026, 04:35 PM UTC

We're seeing a spike in jobs in some regions leading to a temporary capacity shortage. We're looking into re-balancing to reduce the delays
identified Jun 11, 2026, 05:19 PM UTC

We are continuing to work on a fix for this incident. eu-central and us-west still experiencing some queueing, and eu-west has recovered.
monitoring Jun 11, 2026, 06:41 PM UTC

us-west still experiencing some mild queueing, and eu-central and eu-west has recovered.
resolved Jun 11, 2026, 07:38 PM UTC

This incident has been resolved.

Read the full incident report →

Major June 10, 2026

Github reporting API request degradation

Detected by Pingoru: Jun 10, 2026, 03:29 PM UTC
Resolved: Jun 10, 2026, 04:52 PM UTC
Duration: 1h 23m

Timeline · 3 updates

investigating Jun 10, 2026, 03:29 PM UTC

We are currently investigating some reports of delays with job adoption. We are actively looking into this.
monitoring Jun 10, 2026, 04:06 PM UTC

We are currently in the process of re-queuing jobs
resolved Jun 10, 2026, 04:52 PM UTC

Github have reported this incident as resolved

Read the full incident report →