Teem incident

S2 - O365 SSO Logging Issues

Major Resolved View vendor source →

Teem experienced a major incident on March 12, 2024 affecting Authentication (SSO), lasting 17h 45m. The incident has been resolved; the full update timeline is below.

Started
Mar 12, 2024, 04:21 AM UTC
Resolved
Mar 12, 2024, 10:07 PM UTC
Duration
17h 45m
Detected by Pingoru
Mar 12, 2024, 04:21 AM UTC

Affected components

Authentication (SSO)

Update timeline

  1. investigating Mar 12, 2024, 04:21 AM UTC

    We are currently having issues with what seems to be mainly O365 SSO. We are investigating this issue and will make an update when we have gathered more information.

  2. investigating Mar 12, 2024, 06:26 AM UTC

    We are continuing to investigate this issue we will update at 8:00 AM MST

  3. investigating Mar 12, 2024, 02:01 PM UTC

    We are continuing to investigate this issue we will update at 1:00 PM CST

  4. investigating Mar 12, 2024, 06:04 PM UTC

    Our Engineering team is continuing the investigating to determine the cause of the disruption. Next update will be posted at 5:00pm CST

  5. identified Mar 12, 2024, 08:43 PM UTC

    The issue with O365 SSO has been identified and a fix is being implemented. We will post another update at 8:00pm CST or earlier when testing is complete.

  6. monitoring Mar 12, 2024, 09:02 PM UTC

    Our Engineering team has found a fix and implemented it. We have had some confirmation the issue is resolved. We will be in monitoring for the next hour while awaiting further confirmations.

  7. resolved Mar 12, 2024, 10:07 PM UTC

    The issue is now resolved. We appreciate your patience while we resolved this issue. We'll be posting the RCA by March 26th.

  8. postmortem Mar 25, 2024, 04:48 PM UTC

    **Teem by Eptura detailed Root Cause Analysis | 3/11/2024** **S2 – O365 SSO Login Issues** We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident. **Description:** **\(The Incident is logged in MST\)** **On Monday on Monday March 11th Internal Teem members and customers started experiencing issues with logging into the Teem application via O365. At around 10:15 PM a fire alarm is pulled by Jared Collins Teem Support Manager** **Type of Event:** Outage **Services/Modules impacted:** SSO/Logging **Timeline:**` `**The timeline is posted in MST.** We received issues of logging in at 5:25 PM but it was still logging perfectly fine after a few tries. We confirmed logging worked over the next few hours and kept a close eye on things. At 10:06 PM A customer reaches out that logging is having issues, internally we are now having issues as well. 10:08 PM a Jira is started to be created for our Engineering team to jump on the issue. 10:15 PM a Fire Alarm is pulled. 10:21 PM Status page is updated with the status of issue at hand. 10:23 PM Engineering confirms they are working on the issue. 11:54 PM Status page is updated once more for the evening. 8:00 AM status page is updated. 12:00 PM Status page is updated again that the issue is currently still at hand. 2:37 PM Engineering has found a fix and implemented said fix to environment. 3:05 PM customers confirming that the issue is resolved. 4 PM the status page is taken down. **Total Duration of Event:** 16 Hours **Root Cause:** The cause of the issue at hand was an outdated token living on a server that is older. The token was re-established and the issue was then fixed. **Remediation:** We have put Fire Alarms in to notify the team of SSO expiration a month before it happens so we can update the token and ensure that SSO has no issues going forward. **Preventative Action:** Having a system in place to notify of token expiration.