Teem incident
Degraded performance: app.teem.com and other services
Teem experienced a minor incident on December 9, 2019 affecting Web Interface and EventBoard, lasting 6d 18h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Dec 09, 2019, 09:39 PM UTC
We’re currently experiencing a service disruption with reports of loading issues with app.teem.com, EventBoard, and LobbyConnect. Our team is working to identify the root cause and implement a solution. Next update will be by 4:30pm MT/6:30pm ET
- investigating Dec 09, 2019, 11:18 PM UTC
We are continuing to investigate the cause of the issue. The system is catching back up and performance is gradually improving Next update will be by 7pm MT/9pm ET
- monitoring Dec 10, 2019, 01:00 AM UTC
Performance has returned to normal for app.teem.com, Eventboard, LobbyConnect and all other apps/services are also fully functional. We have not identified the root cause yet, but are continuing to investigate and monitor. Next update will be tomorrow by 9am MT/11am ET
- monitoring Dec 10, 2019, 03:52 PM UTC
Performance remains normal. We have made adjustments to our back-end systems and are continuing to monitor the results through the day. Next update today by 4pm MT/6pm ET
- investigating Dec 10, 2019, 05:19 PM UTC
We’re currently experiencing a recurrence of the service disruption, with reports of loading slowness with app.teem.com, EventBoard, LobbyConnect, as well as other services such as device registrations. Our team is actively investigating. Next update will be by 12:30pm MT/2:30pm ET
- investigating Dec 10, 2019, 07:21 PM UTC
We are continuing to investigate the cause of the increased load on our systems. Performance continues to be impacted with extended periods of slowness or failing to load applications. Next update: 3pm MT/5pm ET
- investigating Dec 10, 2019, 09:55 PM UTC
We are continuing to investigate the increased load on the system and its cause. Performance continues to be impacted with extended periods of slowness or failing to load applications. Next update will be by 6pm MT/8pm ET
- investigating Dec 11, 2019, 12:41 AM UTC
Performance has returned to normal for app.teem.com, Eventboard, LobbyConnect and all other apps/services are also fully functional. The investigation actively continues and is focused on finding and addressing the reason for the increased load yesterday and today. Next update will be tomorrow by 9am MT/11am ET
- identified Dec 11, 2019, 04:25 PM UTC
Performance has been normal for app.teem.com, Eventboard, LobbyConnect and all other apps/services since the last update. Teem has identified an abnormal spike in change notifications for certain calendar providers and is in the process of updating handling and processing systems to account for the change in behavior. Next update will be as status changes or tomorrow by 9am MT/11am ET
- identified Dec 12, 2019, 07:38 PM UTC
Performance has been normal for app.teem.com, EventBoard, LobbyConnect and all other apps/services since the last update. Teem is preparing an update to handling and processing systems (currently in testing) and will deploy when complete. Next update will be as status changes or tomorrow by 9am MT/11am ET
- monitoring Dec 13, 2019, 06:22 PM UTC
Performance has been normal for app.teem.com, EventBoard, LobbyConnect and all other apps/services since the last update. Teem has released an update to help prevent degraded performance in this scenario in the future and will continue to monitor throughout the day, at which time we will resolve the incident.
- resolved Dec 16, 2019, 03:55 PM UTC
After monitoring throughout the weekend this incident has been resolved
- postmortem Feb 07, 2020, 11:56 PM UTC
On December 16th the Teem platform received a large spike in traffic from Google Calendar provider which ultimately created a large backlog of asynchronous platform jobs. The systems over balanced these jobs causing heavy load on the database. Tuning the asynchronous capacity for the related workloads helped clear the backlog and prevent further issues. Additional database maintenance was performed in late January to further reduce the possibility of recurrence.