Tonkean experienced a critical incident on October 16, 2025 affecting Workflows Runtime and User Interfaces (Forms, Item Interfaces, Workspace Apps, Business Reports) and 1 more component, lasting 13m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 16, 2025, 12:18 PM UTC
We are currently investigating this issue.
- resolved Oct 16, 2025, 12:31 PM UTC
This incident has been resolved.
- postmortem Oct 19, 2025, 12:54 PM UTC
**Root Cause Analysis – Autoscaler Permission Change Incident** On October 16, 2025, we changed the permissions used by our autoscaling service as part of upcoming infrastructure improvements. The change was applied at 11:52 UTC. Initially, system performance remained stable, but at 12:06 UTC, the autoscaler unexpectedly stopped working. As a result, new system capacity could not be added when needed, and some services began to degrade as demand increased. The recent permission change was identified as the likely cause. A rollback was initiated, and by 12:15 UTC, the original permissions were restored and the autoscaler restarted. System capacity began recovering shortly afterward, and by 12:31 UTC, all services were back to normal. The total period of degraded performance lasted approximately 25 minutes. No data loss occurred. The issue was caused by a mismatch in the updated permissions, which interfered with the autoscaler's ability to manage capacity. When triggered, this caused the autoscaler to fail silently until workloads exceeded the available infrastructure. To resolve the issue, we reversed the change, restarted the service, and confirmed that everything was operating normally again. Several actions are now in place to reduce the risk of recurrence. These include adding pre-change validation for permission updates, improving monitoring for autoscaler failures, and strengthening review procedures for operational changes. These improvements are integrated into our deployment practices to support continued system stability and operational scalability.