Totango experienced a major incident on November 20, 2023 affecting Totango Web Application, lasting 3h 3m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Nov 20, 2023, 11:26 AM UTC
We are currently investigating reports of failed login attempts.
- monitoring Nov 20, 2023, 11:57 AM UTC
A fix has been implemented and we are monitoring the results.
- resolved Nov 20, 2023, 02:30 PM UTC
This incident has been resolved.
- postmortem Nov 20, 2023, 02:43 PM UTC
**Event Description:** On November 20, 2023, a critical issue was reported concerning the Single Sign-On \(SSO\) mechanism using Microsoft services. Customers began experiencing difficulties in logging in, with multiple complaints received by the support team at approximately 13:00 UTC. **Timeline:** * **13:00, Nov 20, 2023**: Notification received from support about multiple customer complaints regarding SSO login issues. * **13:40, Nov 20, 2023**: The root cause of the issue was identified. * **13:55, Nov 20, 2023**: Issue was resolved. **Root Cause:** The issue was traced back to the expiration of the secret keys for our SSO application in Azure. This expiration, which likely occurred around November 18th, went unnoticed as there were no alerts or notifications generated for this event. **Steps to Resolution:** * A new secret key was generated and updated in the parameter store. **Lesson Learned and Preventive Actions:** 1. **Engagement with Azure Support**: We will be reaching out to Azure support to investigate why no alerts were triggered for the key expiration. 2. **Implementation of New Alert System**: An alert has been added in New Relic \(NR\) to specifically monitor for key expiration errors. This action has already been completed.