Tonkean incident

Uploading files to Google Drive errors out

Minor Resolved View vendor source →

Tonkean experienced a minor incident on September 29, 2024 affecting Data Sources (Webhooks, Data Collection & Actions), lasting 1d 1h. The incident has been resolved; the full update timeline is below.

Started
Sep 29, 2024, 05:09 PM UTC
Resolved
Sep 30, 2024, 06:23 PM UTC
Duration
1d 1h
Detected by Pingoru
Sep 29, 2024, 05:09 PM UTC

Affected components

Data Sources (Webhooks, Data Collection & Actions)

Update timeline

  1. investigating Sep 29, 2024, 05:09 PM UTC

    We have identified an outage with Google Drive that impacts our Google application and does not allow it to make "upload" API calls due to some sort of protection on the Google side. We've opened a high-priority support case with Google and have escalated this issue as much as we can. We're waiting to get a response from Google promptly.

  2. resolved Sep 30, 2024, 06:23 PM UTC

    The issue is currently resolved and upload API calls are working as expected. This seems to be a global Google Drive outage that impacted other products and customers as seen in this issue: https://issuetracker.google.com/issues/369788650?pli=1 We've opened multiple support cases with Google and escalated the issue. We will re-open this incident if the problem appears again

  3. postmortem Sep 30, 2024, 06:24 PM UTC

    **Impact** The incident affected our ability to upload files to Google Drive via the API. During the outage, Google Drive’s API calls for uploading files were blocked due to an issue on Google’s end. No other core functionalities of our application were impacted. ‌ **Timeline of Events** * **Sep 26th, 10:00 AM PST – Incident Identified** We detected an issue preventing file uploads to Google Drive using the API. The root cause appeared to be related to a protection or outage on Google’s side. Multiple high-priority support cases were raised with Google, and the issue was escalated for resolution. _Data Sources: Webhooks, Data Collection & Actions_ * **Sep 27th, 5:00 AM PST – Incident Resolved** Google resolved the underlying issue temporarily, and upload API calls started functioning normally. * **Sep 28th, 11:30 PM PST – Issue Resurfaces** The same issue occurred again, impacting the ability to upload files via the Google Drive API. We reopened the case with Google, further escalating the issue. * **Sep 30th, 7:00 AM PST – Incident Resolved Again** Google resolved the issue once more, and file uploads via the API returned to normal functionality. _Data Sources: Webhooks, Data Collection & Actions_ ‌ **Root Cause Analysis** The outage was caused by a global Google Drive issue, as referenced in this [Google Issue Tracker](https://issuetracker.google.com/issues/369788650?pli=1). The protection or failure in Google’s system affected multiple products globally, including our integration with Google Drive for file uploads. ‌ **Resolution and Mitigation** * Google resolved the underlying issue both times, restoring full functionality to the Google Drive Upload API. * No action was required on our side once Google fixed the problem. * We will continue to monitor the situation and re-open the incident if the problem reoccurs. ‌ **Action Items** * Continue communication with Google to understand potential long-term fixes. * Implement the capability to configure a server for tunneling traffic through different regions. * Add capability to route Google Drive API calls through a different region using a tunneling method if the issue occurs again, as a temporary solution to mitigate potential regional service disruptions. ‌ **Lessons Learned** * **Monitoring**: Relying on external service providers \(like Google\) requires robust monitoring and communication channels with those providers to promptly escalate and track issues. * **Communication**: Swift communication with Google and prompt escalation of support cases helped minimize confusion and ensured a timely resolution. ‌ **Next Steps** * Add capability to configure tunneling of traffic through different regions. * Ensure that API calls for Google Drive can be rerouted through a different region in the event of a similar outage. * Continue to monitor Google Drive API performance and follow up with Google for detailed RCA from their side if applicable.