Screenly incident

Issue affecting some Raspberry Pi 3 devices running UC16

Notice Resolved View vendor source →

Screenly experienced a notice incident on June 20, 2023, lasting 1d 15h. The incident has been resolved; the full update timeline is below.

Started
Jun 20, 2023, 06:55 PM UTC
Resolved
Jun 22, 2023, 10:52 AM UTC
Duration
1d 15h
Detected by Pingoru
Jun 20, 2023, 06:55 PM UTC

Update timeline

  1. investigating Jun 20, 2023, 03:59 PM UTC

    We are currently investigating this issue.

  2. investigating Jun 20, 2023, 04:02 PM UTC

    We're investigating an issue reported by some customers. The issue appears to be limited to Raspberry Pi 3s running Ubuntu Core 16 (UC16). We are working with Canonical (Ubuntu) to triage the issue further.

  3. investigating Jun 20, 2023, 06:55 PM UTC

    While this is still under active investigation by both our and Canonical/Ubuntu's engineers, our initial investigation indicates that this was caused by a system update in the operating system. Some of these devices have been been reported to recover after being power cycled. To minimize that the issue spreads to additional devices, we've issued a freeze on further updates to the affected cohort of devices that are online. It should be noted that this issue doesn't appear to be affecting any newer Screenly Player (these are all Pi4-based) or Screenly Player Max, as these all are using Ubuntu Core 20 (UC20).

  4. investigating Jun 21, 2023, 07:46 AM UTC

    Investigation is still ongoing. We'll provide an update as soon as we have received a new update from Canonical.

  5. investigating Jun 21, 2023, 11:14 AM UTC

    We are working closely with Canonical's engineers on trying to narrow down the root cause and are exploring various hypotheses. In the meantime, we're seeing that a number of the affected screens have recovered by themselves.

  6. investigating Jun 21, 2023, 05:17 PM UTC

    We've spent the day working with Canonical's engineers on the issue and are fairly confident that the issue was caused by an update to "core" snap (which is essentially the operating system) going from version 16-2.59.3 to 16-2.59.4. It is still unclear how the upgrade broke exactly, but we believe that we are getting closer to the root cause. If you are effected, please get in touch with [email protected].

  7. resolved Jun 22, 2023, 10:52 AM UTC

    Many of the affected screens have now recovered. If your screens are still affected and you have not yet been in touch with us, please do reach out to [email protected] for next steps.

  8. postmortem Jun 22, 2023, 10:52 AM UTC

    On 2023-06-20, we started receiving support tickets from a few clients in Asia describing an issue with screens showing a purple Screenly logo rather than the expected content. A few hours later, we started receiving similar reports from customers in the U.S.. We’ve now narrowed down the issue to a memory issue in `apparmor_parser` \(used by `snapd`\), which in turn is causing an Out Of Memory \(OOM\) situation where Out Of Memory Killer \(OOM Killer\) shuts down processes. The Ubuntu Core / Snap Team is actively working on resolving the issue and the progress is tracked [here](https://bugs.launchpad.net/snapd/+bug/2025030). In the meantime, we’ve instructed all UC16 devices that are online and reachable to hold any future updates. ‌ **What we are doing to next** * We are expanding our new [Quality Control setup](https://www.screenly.io/blog/2023/05/25/updated-qc-rig/) to include combinations of older hardware/software not previously covered \(i.e. Raspberry Pi 3 with UC16\). * We are working closely with Canonical/Ubuntu to enhance their upstream testing of software updates. * We are working on adding manual approval processes for new upstream updates.