Teem experienced a major incident on March 12, 2024 affecting Authentication (SSO), lasting 17h 45m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Mar 12, 2024, 04:21 AM UTC
We are currently having issues with what seems to be mainly O365 SSO. We are investigating this issue and will make an update when we have gathered more information.
- investigating Mar 12, 2024, 06:26 AM UTC
We are continuing to investigate this issue we will update at 8:00 AM MST
- investigating Mar 12, 2024, 02:01 PM UTC
We are continuing to investigate this issue we will update at 1:00 PM CST
- investigating Mar 12, 2024, 06:04 PM UTC
Our Engineering team is continuing the investigating to determine the cause of the disruption. Next update will be posted at 5:00pm CST
- identified Mar 12, 2024, 08:43 PM UTC
The issue with O365 SSO has been identified and a fix is being implemented. We will post another update at 8:00pm CST or earlier when testing is complete.
- monitoring Mar 12, 2024, 09:02 PM UTC
Our Engineering team has found a fix and implemented it. We have had some confirmation the issue is resolved. We will be in monitoring for the next hour while awaiting further confirmations.
- resolved Mar 12, 2024, 10:07 PM UTC
The issue is now resolved. We appreciate your patience while we resolved this issue. We'll be posting the RCA by March 26th.
- postmortem Mar 25, 2024, 04:48 PM UTC
**Teem by Eptura detailed Root Cause Analysis | 3/11/2024** **S2 – O365 SSO Login Issues** We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident. **Description:** **\(The Incident is logged in MST\)** **On Monday on Monday March 11th Internal Teem members and customers started experiencing issues with logging into the Teem application via O365. At around 10:15 PM a fire alarm is pulled by Jared Collins Teem Support Manager** **Type of Event:** Outage **Services/Modules impacted:** SSO/Logging **Timeline:**` `**The timeline is posted in MST.** We received issues of logging in at 5:25 PM but it was still logging perfectly fine after a few tries. We confirmed logging worked over the next few hours and kept a close eye on things. At 10:06 PM A customer reaches out that logging is having issues, internally we are now having issues as well. 10:08 PM a Jira is started to be created for our Engineering team to jump on the issue. 10:15 PM a Fire Alarm is pulled. 10:21 PM Status page is updated with the status of issue at hand. 10:23 PM Engineering confirms they are working on the issue. 11:54 PM Status page is updated once more for the evening. 8:00 AM status page is updated. 12:00 PM Status page is updated again that the issue is currently still at hand. 2:37 PM Engineering has found a fix and implemented said fix to environment. 3:05 PM customers confirming that the issue is resolved. 4 PM the status page is taken down. **Total Duration of Event:** 16 Hours **Root Cause:** The cause of the issue at hand was an outdated token living on a server that is older. The token was re-established and the issue was then fixed. **Remediation:** We have put Fire Alarms in to notify the team of SSO expiration a month before it happens so we can update the token and ensure that SSO has no issues going forward. **Preventative Action:** Having a system in place to notify of token expiration.