Teem experienced a minor incident on April 11, 2024 affecting Google Apps Calendar, lasting 17d 18h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Apr 11, 2024, 09:46 PM UTC
We are currently investigating an issue with customers using Google Calendar Service. Our Engineering team is currently investigating to determine the cause of the disruption. The next update will be posted at 7:45pm MDT.
- investigating Apr 12, 2024, 01:35 AM UTC
We are continuing to investigate this issue in regard to Google Calendar Sync. The next update will be posted at 11:45pm MDT.
- identified Apr 12, 2024, 01:56 AM UTC
The issue with Google Calendar Service. has been identified and a fix is being implemented. We will post another update at 1am CST.
- monitoring Apr 12, 2024, 05:58 AM UTC
A fix has been implemented. We are moving into the Monitoring Phase for the next 4 hours.
- monitoring Apr 12, 2024, 10:27 AM UTC
We are continuing to monitor for any further issues for next 4 hours.
- investigating Apr 12, 2024, 01:22 PM UTC
As the fix implemented haven't resolved the issue completely, we have moved to the investigation phase. Our Engineering team is currently investigating the issue with Google Calendar Service to determine the cause of the disruption. The next update will be posted at 12 PM CST.
- investigating Apr 12, 2024, 04:27 PM UTC
We are continuing to investigate this issue on priority. We will post another update at 4 PM CST.
- identified Apr 12, 2024, 09:06 PM UTC
As the previous fix implemented did not resolve the issue completely. We are continuing the investigation with Google Calendar Service and have determined the cause of the disruption and are working on a fix. The next update will be posted at 8 PM CST.
- identified Apr 13, 2024, 04:13 AM UTC
We are continuing to investigate this issue on priority. We apologize for the delay, next update will be at 3 am CST
- monitoring Apr 13, 2024, 08:12 AM UTC
A fix has been implemented. We are moving into the Monitoring Phase for the next 12 hours.
- monitoring Apr 14, 2024, 07:47 PM UTC
We are continuing to monitor for any further issues for next 12 hours.
- monitoring Apr 15, 2024, 12:51 PM UTC
We are continuing to monitor for any further issues for next 12 hours.
- investigating Apr 16, 2024, 08:32 AM UTC
As the fix implemented haven't resolved the issue completely, we have moved to the investigation phase. Our Engineering team is currently investigating the issue with Google Calendar Service to determine the cause of the disruption. The next update will be posted at 7 AM CST.
- investigating Apr 16, 2024, 11:49 AM UTC
We are continuing to investigate this issue on priority. We apologize for the delay, next update will be shared at 11 AM CST.
- monitoring Apr 16, 2024, 03:56 PM UTC
A fix had been identified and applied to optimize the performance of the Google Calendar Sync. We are moving into the Monitoring Phase for the next 4 hours and next update will be shared at 3 PM CST.
- monitoring Apr 16, 2024, 09:03 PM UTC
We are continuing to monitor for any further issues for next 12 hours.
- monitoring Apr 17, 2024, 12:42 PM UTC
We are continuing to monitor for any further issues for next 12 hours.
- investigating Apr 18, 2024, 12:38 PM UTC
As the fix implemented haven't resolved the issue completely, we have moved to the investigation phase. Our Engineering team is currently investigating the issue with Google Calendar Service to determine the cause of disruption. The next update will be posted at 11:30 AM CST.
- investigating Apr 18, 2024, 04:34 PM UTC
We are continuing to investigate this issue on priority. We apologize for the delay, next update will be shared at 3:30 PM CST.
- investigating Apr 18, 2024, 08:34 PM UTC
Our team is currently working to resolve an issue that is impacting sync times for customers using Google calendars. We want to assure you that our team is fully committed to resolving this issue as swiftly as possible. We recognize the importance of timely event syncing, and we apologize for any delays you may be experiencing. Restoring normal calendar sync performance is our top priority, and we will keep you updated with additional status updates as we make progress towards a resolution. Thank you.
- identified Apr 19, 2024, 05:52 PM UTC
We appreciate your patience as our team is working diligently to resolve the issue with unpredictable timing of calendar event synchronization. We have identified a potential issue which could be causing these symptoms and are actively working to address it. Currently, we have observed that the PgBouncer and PgBouncer_ro services will not run simultaneously on job managers. Due to the startup script, it is unclear which of the two services is running, and it seems that the "last to start wins" scenario occurs. In the event of an instance restart, a different service could potentially "win" and cause further inconsistency. To resolve this, we have worked on a solution where these services now have separate unix socket directories. By providing different unix socket directories to both services, they can run simultaneously and eliminate the inconsistency. This eliminated significant errors on jobmanagers. Our team is dedicated to restoring normal calendar sync performance, and we will keep you updated with additional status updates as we continue to monitor and make progress towards a resolution.
- monitoring Apr 22, 2024, 05:08 PM UTC
We appreciate your patience as our team is working diligently to resolve the issue with unpredictable timing of calendar event synchronization. We have identified an issue that was causing these symptoms, and have developed and implemented a fix to resolve the issue. Our team is dedicated to ensuring normal calendar sync performance, and we will continue to monitor the issue to ensure full resolution and provide additional status updates if needed. Thank you.
- resolved Apr 29, 2024, 04:23 PM UTC
We deeply appreciate your patience as our team worked diligently to resolve the recent calendar event synchronization timing issue. We are pleased to inform you that we have successfully identified the root cause of the problem and have implemented a fix to resolve it. Our team is committed to ensuring normal calendar performance and will continue to monitor this issue closely to ensure the best possible customer experience. For any further questions and concerns, please reach out to our dedicated support team. Thank you.
- postmortem May 20, 2024, 05:25 PM UTC
**Teem by Eptura detailed Root Cause Analysis | April 11, 2024** **S2 Google Calendar Service not Synchronizing** We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident. **Description:** Customers using the Google Calendar service experienced events that were not automatically synced. During this time, a workaround was provided to force a manual sync, updating the calendars. **Type of Event:** Functionality Issue **Services/Modules impacted:** Production/ Google Calendar Service **Timeline** \(Reported MST\)**:**` ` On the late afternoon of April 11th, 2024, at approximately 3:50pm, multiple customers reported an issue with their Google Calendar Service not automatically syncing calendar events. Customers were provided a temporary workaround to manually force sync their calendars. All customers were made aware of the Severity 2 incident via Teem Status Page. The investigation continued through April 19, 2024, when the CloudOps team identified the root cause of the issue. On April 22, 2024, at approximately 11:08am, all customers were notified via Status Page that the fix had been implemented and we moved into a monitoring phase. After continuous monitoring, no additional reports for Google Calendar Events and customers confirming that their Calendar events were syncing automatically, the Severity 2 incident was marked as resolved on April 29, 2024, at 10:23am. **Total Duration of Event:** 17 days, 18 hours, 33 minutes **Root Cause:** We observed that the PgBouncer and PgBouncer\_ro services will not run simultaneously on job managers. Due to the startup script, it is unclear which of the two services is running, and it seems that the "last to start wins" scenario occurs. In an instance restart, a different service could "win" and cause further inconsistency. We have also discovered that three of our Job Managers are running outdated code. **Remediation:** These services shared a unix socket directory. By providing different unix socket directories, the services both would run simultaneously and eliminate the inconsistency. This eliminated significant errors on the jobmanagers **Preventative Action:** Our team is dedicated to continuously improving the Google Calendar Service by enhancing our current processes and implementing robust monitoring systems. We appreciate your patience and cooperation during this disruption.