The Things Industries incident

Downlink failures

Major Resolved View vendor source →

The Things Industries experienced a major incident on September 10, 2020 affecting Europe 1 (eu1) and North America 1 (nam1) and 1 more component, lasting 1d 4h. The incident has been resolved; the full update timeline is below.

Started
Sep 10, 2020, 12:58 PM UTC
Resolved
Sep 11, 2020, 05:19 PM UTC
Duration
1d 4h
Detected by Pingoru
Sep 10, 2020, 12:58 PM UTC

Affected components

Europe 1 (eu1)North America 1 (nam1)Australia 1 (au1)

Update timeline

  1. investigating Sep 10, 2020, 12:58 PM UTC

    We are investigating application linking issues in the Europe (eu1) cluster of our Cloud Hosted service.

  2. monitoring Sep 10, 2020, 03:06 PM UTC

    We've identified the cause and have applied a fix. We're monitoring the system.

  3. monitoring Sep 11, 2020, 08:56 AM UTC

    On monitoring the status overnight, we see that the application links are stable, uplinks are being processed. However, we are noticing a drop in delivered downlinks. We've traced the possible cause of the issue and are working on applying a fix.

  4. monitoring Sep 11, 2020, 10:51 AM UTC

    We are deploying a patch to `eu1`, `au1` and `nam1` clusters and are monitoring results.

  5. identified Sep 11, 2020, 03:04 PM UTC

    We're investigating an issue with downlink not being scheduled to gateways. The cluster does not correctly keep track of downlink paths to gateways, so the Network Server cannot reach a Gateway Server to which the gateway is connected. We're on it and post progress here.

  6. monitoring Sep 11, 2020, 04:44 PM UTC

    A fix has been implemented and we are monitoring the results.

  7. monitoring Sep 11, 2020, 04:45 PM UTC

    We are continuing to monitor for any further issues.

  8. resolved Sep 11, 2020, 05:19 PM UTC

    The issue has been identified and fixed. Everything is back operational.