HetrixTools Outage History

HetrixTools is up right now

HetrixTools had 12 outages in the last 2 years totaling 288h 14m of downtime — averaging 0.5 incidents per month.

There were 12 HetrixTools outages since July 14, 2025 totaling 288h 14m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.hetrixtools.com

Notice May 5, 2026

DENIC issues with resolving .DE domains

Detected by Pingoru
May 05, 2026, 09:26 PM UTC
Resolved
May 05, 2026, 11:12 PM UTC
Duration
1h 45m
Affected: Uptime Monitoring
Timeline · 2 updates
  1. identified May 05, 2026, 09:26 PM UTC

    We've been internally alerted of a higher-than-usual fail rate for our Uptime Monitoring system. The DENIC domain registry is currently having issues, causing most of the .DE domains to not resolve. Our system is currently detecting these uptime monitors as being down, as it should be, since they are not resolving properly. Sources: https://status.denic.de/ https://news.ycombinator.com/item?id=48027897 https://www.reddit.com/r/de_EDV/comments/1t4qlrg/psa_die_dezone_l%C3%B6st_gerade_gro%C3%9Ffl%C3%A4chig_nicht_mehr/ This issue is out of our hands, and our system is working as expected. We're posting this here so that our users are informed of what is happening with their .DE uptime monitors. We'll continue to monitor the situation as it progresses.

  2. resolved May 05, 2026, 11:12 PM UTC

    This incident has been resolved.

Read the full incident report →

Minor April 30, 2026

Server Monitoring Metrics API Partially Impaired

Detected by Pingoru
Apr 30, 2026, 01:12 PM UTC
Resolved
Apr 30, 2026, 01:12 PM UTC
Duration
Timeline · 1 update
  1. resolved Apr 30, 2026, 02:11 PM UTC

    At around 13:12 UTC, we noticed an increase in errors with our Server Monitoring Metrics API. After further investigation, our techs discovered that one of our back-end nodes was experiencing issues and not ingesting all monitoring metrics as expected. The issue has been fully resolved by 14:00 UTC. The issue has affected about 2.7% of our Server Monitoring Agents, which may have had issues posting their metrics to our platform.

Read the full incident report →

Minor March 30, 2026

Blacklist Monitoring Queue Issues

Detected by Pingoru
Mar 30, 2026, 09:45 PM UTC
Resolved
Mar 30, 2026, 11:43 PM UTC
Duration
1h 57m
Affected: Blacklist Monitoring
Timeline · 6 updates
  1. investigating Mar 30, 2026, 07:39 PM UTC

    We're currently investigating issues that are affecting our Blacklist Monitoring Queue.

  2. identified Mar 30, 2026, 09:04 PM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Mar 30, 2026, 09:45 PM UTC

    A fix has now been implemented, and we're slowly restarting the microservices powering the Blacklist Monitoring Queue.

  4. monitoring Mar 30, 2026, 09:47 PM UTC

    The queue has a big backlog to process, so it will take some time for it to catch up and process monitors in real time, as it does when healthy. We'll provide further updates here as this process advances.

  5. monitoring Mar 30, 2026, 10:48 PM UTC

    The queue backlog is now at 50% of its level when processing was resumed. We'll follow up with another update once it's fully caught up.

  6. resolved Mar 30, 2026, 11:43 PM UTC

    The backlog has now been fully processed, and the Blacklist Monitoring Queue is now running healthy as expected.

Read the full incident report →

Notice March 18, 2026

Increased error rate for websites behind Cloudflare

Detected by Pingoru
Mar 18, 2026, 12:20 PM UTC
Resolved
Mar 18, 2026, 03:14 PM UTC
Duration
2h 53m
Affected: Uptime Monitoring
Timeline · 2 updates
  1. identified Mar 18, 2026, 12:20 PM UTC

    We've been observing an increased rate of 520 HTTP errors from HEAD requests to websites behind Cloudflare. Cloudflare has recently acknowledged the issue on their end and is working on a fix: https://www.cloudflarestatus.com/incidents/kxvggpg7kwx5 In the meantime, you can fix the issue right away in your HetrixTools account by changing the "HTP Method" from "HEAD" to "GET" for any of your website monitors (by editing the monitor's settings): https://docs.hetrixtools.com/wp-content/uploads/2022/10/image.png You can check if you're affected by this Cloudflare issue by checking your Location Fail Log (https://docs.hetrixtools.com/location-fail-log/) and looking for 520 HTTP errors.

  2. resolved Mar 18, 2026, 03:14 PM UTC

    Cloudflare has updated the status of their incident to resolved. We're noticing that the number of 520 HTTP errors from HEAD requests has gone down. This incident is now resolved.

Read the full incident report →

Notice December 24, 2025

Documentation Outage

Detected by Pingoru
Dec 24, 2025, 05:00 AM UTC
Resolved
Dec 24, 2025, 05:00 AM UTC
Duration
Timeline · 1 update
  1. resolved Dec 24, 2025, 10:04 AM UTC

    The server hosting our Documentation has suffered a critical disk failure. Recovery has taken a bit more than initially expected. The functionality of our Documentation has been impaired for approximately 4 hours and 9 minutes. It has now been fully restored and recovered.

Read the full incident report →

Notice December 6, 2025

access.redhawk.org - Unresponsive & Removed

Detected by Pingoru
Dec 06, 2025, 08:17 PM UTC
Resolved
Dec 06, 2025, 08:17 PM UTC
Duration
Affected: Blacklist Monitoring
Timeline · 1 update
  1. resolved Dec 06, 2025, 08:17 PM UTC

    This RBL has become unresponsive for a few days now. We have now removed it from our system. If your IPs or domains were blacklisted here, they will be marked as delisted after the next automatic checkup. We'll reactivate the RBL if/when it comes back online.

Read the full incident report →

Notice December 6, 2025

Microsoft SNDS API Increased Errors

Detected by Pingoru
Dec 06, 2025, 03:11 PM UTC
Resolved
Dec 16, 2025, 04:41 PM UTC
Duration
10d 1h
Affected: Blacklist Monitoring
Timeline · 4 updates
  1. identified Dec 06, 2025, 03:11 PM UTC

    We're observing a significant increase in API failures when fetching blacklisting data from the Microsoft SNDS platform. They have recently (4 Dec 2025) performed some changes in how their API works, and we believe these errors might be related to that. Our system has been configured for such scenarios and will fallback onto older/stale data if it cannot fetch fresh blacklisting data from the SNDS API. This may cause delayed blacklisting/delisting notices on the Microsoft SNDS blacklists. Unfortunately, this issue is out of our hands, and we hope they'll fix their API soon so our system can resume regular monitoring of the Microsoft SNDS blacklists. In the meantime, we're keeping an eye on the situation as it progresses.

  2. identified Dec 09, 2025, 08:38 AM UTC

    We're still seeing very high 400/500 HTTP error and timeout rates from the Microsoft SNDS API. Our system is currently able to update the SNDS blacklisting data once a day, down from the usual once per hour under normal circumstances. Our tech department is making ongoing efforts to bring this frequency down further. We'll keep you posted here with further updates as we have them.

  3. identified Dec 10, 2025, 10:08 PM UTC

    Our techs have now reduced the update frequency down to every 3 hours. We're still seeing an unusually high number of errors from the Microsoft SNDS API.

  4. resolved Dec 16, 2025, 04:41 PM UTC

    Based on tests we've run over the past few days, we've concluded that updating the SNDS data more often than once every 12 hours will result in too many errors from the SNDS API. We're currently sticking to this update frequency until these errors are addressed on their end.

Read the full incident report →

Critical November 18, 2025

Cloudflare Outage

Detected by Pingoru
Nov 18, 2025, 11:34 AM UTC
Resolved
Nov 18, 2025, 04:34 PM UTC
Duration
4h 59m
Affected: WebsiteDashboardBlacklist MonitoringUptime MonitoringServer Agent MetricsWhite Label PagesNotificationsAPIDocumentation
Timeline · 6 updates
  1. identified Nov 18, 2025, 02:10 PM UTC

    We're aware that some of our components behind Cloudflare are experiencing issues at this time: https://www.cloudflarestatus.com/incidents/8gmgl950y3h7 This currently affects our services as follows: Our Blacklist Monitoring service remains unaffected. Our Uptime Monitoring service continues to process a high number of detected outages, so notifications may be delayed as it processes through its backlog. The Server Agent Metrics system is impaired, as it is behind Cloudflare.

  2. identified Nov 18, 2025, 02:13 PM UTC

    Unfortunately, the outage is still ongoing at this time. We'll continue to monitor the situation as it progresses. Further info about this outage can be found on Cloudflare's status page: https://www.cloudflarestatus.com/incidents/8gmgl950y3h7

  3. monitoring Nov 18, 2025, 02:43 PM UTC

    We're seeing signs that the Cloudflare service is almost fully functional again, with most of the websites behind Cloudflare being seen as UP again by our Uptime Monitoring system. We'll continue to monitor the situation.

  4. monitoring Nov 18, 2025, 02:49 PM UTC

    The Server Agent Metrics endpoint behind Cloudflare is still experiencing a partial loss (~15%) of incoming data.

  5. monitoring Nov 18, 2025, 04:11 PM UTC

    We are still observing a relatively small number of monitoring agents (~5%) unable to access our Cloudflare API endpoint to post their metric data. We'll continue to monitor the situation as it progresses.

  6. resolved Nov 18, 2025, 04:34 PM UTC

    Cloudflare has now resolved the incident. We're still monitoring and hands-on in case issues resurface.

Read the full incident report →

Minor November 2, 2025

Linux Server Monitoring Agent Issues

Detected by Pingoru
Nov 02, 2025, 06:00 AM UTC
Resolved
Nov 02, 2025, 12:00 PM UTC
Duration
5h 59m
Affected: Server Agent Metrics
Timeline · 2 updates
  1. identified Nov 18, 2025, 02:46 PM UTC

    We’ve identified an issue in our Linux Server Monitoring Agent that occurs when Daylight Saving Time ends (in autumn). If the server where the agent is installed is not configured to use the UTC timezone, the agent will fail to collect metrics during the repeated (backward) hour. This only affects the following versions of our Linux Server Monitoring Agent: 2.3.1 2.3.2 2.3.3 2.3.4 The issue stems from how the agent scheduler uses the server's time to run the agent every one minute; a scheduler that won't fire up when the hour turns backwards. A solution has been implemented for this with version 2.3.5 of our Linux Server Monitoring Agent. You can easily update your agent with the following command: wget -4 -qO- https://raw.githubusercontent.com/hetrixtools/agent/master/hetrixtools_update.sh | sudo bash

  2. resolved Nov 18, 2025, 02:47 PM UTC

    This incident has been resolved.

Read the full incident report →

Minor October 20, 2025

Email Notification Issues

Detected by Pingoru
Oct 20, 2025, 05:32 PM UTC
Resolved
Oct 20, 2025, 06:36 PM UTC
Duration
1h 4m
Affected: Blacklist MonitoringNotifications
Timeline · 4 updates
  1. investigating Nov 18, 2025, 02:57 PM UTC

    We are currently investigating an issue where some Email notifications might be sent multiple times due to ongoing AWS Issues.

  2. identified Nov 18, 2025, 02:58 PM UTC

    We have identified the issue and are currently working on a fix for it. Impact: Daily/Weekly/Monthly Reports and Blacklist Monitoring Notifications may be delayed while we fix the issue.

  3. identified Nov 18, 2025, 02:58 PM UTC

    We implemented a fix, and the Email Queue is now functioning properly and catching up on its backlog. We're monitoring it to make sure everything works as expected.

  4. resolved Nov 18, 2025, 02:59 PM UTC

    The Email Queue has caught up with its backlog and is now functioning as expected.

Read the full incident report →

Minor October 20, 2025

AWS Issues

Detected by Pingoru
Oct 20, 2025, 07:03 AM UTC
Resolved
Oct 20, 2025, 10:36 AM UTC
Duration
3h 32m
Affected: DashboardBlacklist MonitoringServer Agent MetricsNotificationsAPI
Timeline · 4 updates
  1. investigating Nov 18, 2025, 03:01 PM UTC

    We are currently investigating connectivity issues with AWS infrastructure in North America. This can cause some parts of our platform to load up slowly or time out.

  2. identified Nov 18, 2025, 03:02 PM UTC

    The incident has been identified and confirmed by AWS, as per their status page: https://health.aws.amazon.com/health/status We're working on mitigating the impact on our end while they work on solving the root cause of the problem.

  3. monitoring Nov 18, 2025, 03:03 PM UTC

    We're seeing no more errors related to the AWS outage. We're still monitoring the situation to make sure that our systems are fully recovered.

  4. resolved Nov 18, 2025, 03:03 PM UTC

    This incident has now been fully resolved.

Read the full incident report →

Minor July 14, 2025

Server Monitoring - Degraded performance for servers using Cloudflare DNS Servers

Detected by Pingoru
Jul 14, 2025, 10:17 PM UTC
Resolved
Jul 15, 2025, 10:50 PM UTC
Duration
1d
Affected: Server Agent Metrics
Timeline · 3 updates
  1. identified Nov 18, 2025, 03:08 PM UTC

    Cloudflare's DNS servers (1.1.1.1 and 1.0.0.1) seem to be having major issues: https://www.cloudflarestatus.com/incidents/28r0vbbxsh8f https://x.com/Newspicel/status/1944880481704796373 This issue affects our Server Monitoring Agent installed on servers that use these DNS servers. Since these servers cannot resolve any hostnames at this time, our Server Monitoring Agent cannot send any metrics to our platform. Resolution: Use diverse DNS servers on your servers (i.e., a mixture of Cloudflare and Google DNS servers), or wait for Cloudflare to fix its DNS servers so that your server regains full DNS resolution capabilities.

  2. monitoring Nov 18, 2025, 03:09 PM UTC

    We're seeing traffic coming back towards normal limits, as Cloudflare implemented a fix for its DNS servers. We'll continue to monitor the situation.

  3. resolved Nov 18, 2025, 03:10 PM UTC

    We're now seeing traffic back to normal, even though Cloudflare hasn't yet marked the issue as resolved on their status page.

Read the full incident report →