Octopus incident
Some Octopus Cloud customers affected by parsing issue with ConfigMap on Kubernetes deployments
Octopus experienced a minor incident on January 22, 2025 affecting Octopus Cloud, lasting 14h 1m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jan 22, 2025, 05:47 PM UTC
We are currently investigating the issue
- identified Jan 23, 2025, 01:42 AM UTC
We have identified the issue with a recently released version of Octopus Server to cloud customers. A recent change has caused config maps with multi-line variables, created via the Kubernetes containers deployment step, to fail. Config maps created by the dedicated Kubernetes config map step are not affected, nor config maps created by Raw YAML or Helm steps. We are in the process of fixing this and will update the public issue with the fixed version: https://github.com/OctopusDeploy/Issues/issues/9221.
- monitoring Jan 23, 2025, 02:47 AM UTC
A fix has been implemented and will be rolled out to affected cloud customers in their next maintenance window.
- resolved Jan 23, 2025, 07:49 AM UTC
We are currently rolling out the fix to Cloud customers. Instances will be upgraded in their next maintenance window. If you are affected by this issue and want to expedite the upgrade, please contact [email protected]
- postmortem Feb 05, 2025, 10:04 PM UTC
# Report and learnings: Configure and apply Kubernetes resources step error parsing configmap.yml ###### Author: Kevin Tchang ## Summary In Octopus Server `2025.1.5751`, a bug caused the deployment of Kubernetes config maps containing multi-line variables, when created through the _Configure and apply Kubernetes resources_ step \(built-in Kubernetes step for deploying containers\), to fail. Config maps created using the dedicated Kubernetes config map step, as well as those generated with Raw YAML or Helm steps, were unaffected. This issue impacted Cloud customers, who experienced deployment failures that had previously been successful. The bug was a regression caused by a change supporting manifest reporting for Kubernetes deployment steps, part of an upcoming feature. This change mistakenly caused line breaks in multi-line Octopus variable values to not be properly escaped when substituted into the config map's key-value pairs. The problem became apparent when customers had PEM certificates or JSON blobs that needed to be inserted into the config map. These were replaced verbatim in Calamari, leading to YAML formatting issues due to unescaped line breaks. ## Background The [_Configure and apply Kubernetes resources_](https://octopus.com/docs/kubernetes/steps/kubernetes-resources) step deploys a combination of Kubernetes Deployment, Service, and Ingress resources. It also allows the optional configuration and deployment of an associated Kubernetes ConfigMap and Secret for reference by the Deployment. To support Rolling Update and Blue/Green deployment strategies, ConfigMap and Secret resources must have unique names for each Deployment version. These resources are assigned [computed names](https://octopus.com/docs/kubernetes/steps/kubernetes-resources?q=configmap#configmap-and-secret), which, by default, combine the resource name with the Octopus deployment ID, and are determined only at deployment time. ## Incident timeline _\(All dates and times below are shown in UTC\)_ ##### **22/1/2025 – 7:31 \(18:31 AEDT\)** Began receiving customer reports of an increase in failing Kubernetes deployments. These failures have been observed across various projects, with similar errors related to parsing config maps. Our support team worked with our customers to troubleshoot the reasons for the failures. ##### **22/1/2025 – 10:58 Jan 22, 2025 \(21:58 AEDT\)** Our support team escalated the issue to our engineering teams. ##### **22/1/2025 – 15:51 \(2:51 AEDT\)** Our internal incident response process was initiated. ##### **22/1/2025 – 21:45 \(8:45 AEDT\)** Our engineers logged on and begin to identify the cause of the incident. ##### **23/1/2025 – 1:42 \(12:42 AEDT\)** The fix for the bug is merged, and our Status Page is updated to _Identified._ ##### **23/1/2025 – 2:47 \(13:47 AEDT\)** Our Status Page is updated to _Monitoring_ as we begin the process to expedite the release `2025.1.7128` of the fix to our affected Cloud customers. ##### **23/1/2025 – 7:49 \(18:49 AEDT\)** Status Page updated to _Resolved._ ## Technical details Before the change to support manifest reporting, the Kubernetes container deployment step created associated Kubernetes config maps \(and secrets\) using the `kubectl create` command with the `--from-files` flag, where each config map key-value pair was sent to Calamari as an individual file. This process was updated to use the more standard `kubectl apply -f` method, where Octopus now sends a single YAML manifest to Calamari representing the config map. The YAML is generated from a config map resource that we build as an in-memory C# object. The bug was introduced when the argument for the config map object used raw, unevaluated Octopus variable values. The issue wasn’t identified during testing because the deployment step involves two stages of variable substitution: the first on Octopus Server, and the second inside Calamari during deployment. The two substitution passes are necessary to support the use of computed names, ensuring that each deployment version has its own unique resources. The change didn't account for multi-line strings as potential variables, causing newline characters to not be properly escaped before serialization. This issue occurred because encoding needs to happen on Octopus Server before the object is serialized into YAML. The second substitution in Calamari is direct on the YAML file. The bug was a regression, and the fix involved evaluating the values before serialization to ensure newline characters were handled correctly. ## Remediation and next steps At Octopus, we take deployment reliability very seriously. After this incident, we conducted a thorough review to identify areas where we can improve our processes, in light of the lessons learned. We’ve identified a complex and unconventional area of the code—specifically script-based Kubernetes deployments—that requires further attention. Given the distinctive challenges these deployments present, we are committed to enhancing this area with additional tests to ensure better reliability.