Worklytics incident
🛑 login is slow, MSFT SSO in particular is affected
Worklytics experienced a major incident on October 8, 2025 affecting Worklytics Platform and worklytics-web-portal, lasting 17h 31m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- resolved Oct 08, 2025, 09:53 PM UTC
The Worklytics Web Portal was degraded from approximately 14:44 PT until 17:20 PT. The Root Cause was increased latency in GCP Secrets Manager, resulting in configuration information stored there being slow to load or timing out entirely. This was an unexpected and unhandled condition. The code to access the secrets manager included handling of outright failures (proactively disabling the SSO methods depending on the inaccesssible secrets) - but not extreme slowness. The default Secrets Manager client timeout is 60s (specified by Google, not Worklytics); we should have lowered this based on our use case ( a web app configuration). Given that we were lazy-loading configuration information for SSO providers, and that 60s is typically HTTP timeout of web browsers, login requests were simply timing out. What *should* have happened some SSO methods simply disappearing from the application. Two fixes have been made: - lowering timeouts as described above to well within HTTP timeout - moving some non-Secret configuration information out of Secrets Manager, to limit dependency on the service; certain values were kept their for convenience of having all config in one place, rather than bc they were values that actually needed to be handled as secrets. Limiting use of secrets manager to truly secret values should reduce the dependency and risk.