Hosted Mender incident
Server-side generated delta (beta) are failing randomly
Hosted Mender experienced a notice incident on April 24, 2024 affecting Hosted Mender EU, lasting 4h 57m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Apr 24, 2024, 03:16 PM UTC
Some of the server-side Generated Delta, which is a beta feature, are not working for some customers in the European cluster. We're investigating the issue.
- identified Apr 24, 2024, 04:24 PM UTC
The issue has been identified: the worker Kubernetes pods that are running the server-side Generated Delta, in some cases get ephemeral storage exhaustion and the job won't run successfully. A fix is being implemented.
- monitoring Apr 24, 2024, 04:43 PM UTC
As a workaround, we applied a Pod Antiaffinity Policy, which avoids server-side Generated Delta pods to be scheduled on the nodes with others worker pods. This allows them to have more storage available
- resolved Apr 24, 2024, 08:14 PM UTC
The metrics indicate no further issues; for now, the situation appears stable with the applied workaround. In the coming days, however, we will implement additional updates.