Vev incident

Authentication provider Is down

Vev experienced a critical incident on June 12, 2025 affecting Platform and Staging and 1 more component, lasting 12h 10m. The incident has been resolved; the full update timeline is below.

Started: Jun 12, 2025, 06:21 PM UTC
Resolved: Jun 13, 2025, 06:32 AM UTC
Duration: 12h 10m
Detected by Pingoru: Jun 12, 2025, 06:21 PM UTC

Affected components

PlatformStagingStandard HostingCDN

Update timeline

investigating Jun 12, 2025, 06:21 PM UTC

We’re currently experiencing an outage due to issues with our authentication provider, Google Identity Platform. As a result, users are temporarily unable to access the platform. Only the platform is affected, not published projects We are monitoring the situation closely and will provide updates as soon as the issue is resolved. Thank you for your patience!
identified Jun 12, 2025, 06:55 PM UTC

We have investigated and identified that the outage affecting both access and published content. • We have confirmed that the root cause of access issues lies with our authentication provider, Google Cloud’s Identity Platform. This is preventing users from logging into the platform. • In addition, Cloudflare, our CDN provider, is experiencing issues. This is affecting published content for projects using standard hosting and embedded content, making them temporarily unavailable. Our team is actively monitoring both incidents and will provide updates as soon as service stability is restored. We appreciate your patience and understanding.
identified Jun 12, 2025, 07:18 PM UTC

We are continuing to work on a fix for this issue.
monitoring Jun 12, 2025, 08:20 PM UTC

All systems seems to be operational again. But we are still monitoring to ensure everything is working as it is supposed to
resolved Jun 13, 2025, 06:32 AM UTC

This incident has been resolved.
postmortem Sep 02, 2025, 06:49 PM UTC

### **Multi-Provider Service Outage** ### **1. Summary** On June 13, 2025, our platform experienced a major outage lasting approximately two hours. The incident presented as a two-fold failure: users were unable to log in to the platform, and simultaneously, all content served via our Standard Hosting and CDN services became unavailable. The root cause was determined to be a rare, concurrent outage of two separate, critical third-party providers: **Google Cloud Identity Platform** \(for user authentication\) and **Cloudflare** \(for content delivery\). Recovery was dependent on the restoration of these external services. This event highlights a critical risk in our dependency on single providers for core functionalities. ### **2. Impact** * **User Access:** 100% of users were unable to log in to the Hosting & Publishing platform. * **Content Availability:** All published projects and embedded content on our Staging and Standard Hosting platforms, which are served via our CDN, were inaccessible to end-users. * **Customer Impact:** This resulted in downtime for both our direct users \(who could not manage their projects\) and their end-users \(who could not access published content\). ‌ ### **3. Timeline of Events \(All times in UTC\)** * **18:21:** First alerts are triggered. Monitoring systems detect a spike in failed login attempts. An incident is declared, and the team begins investigating. An initial status update is posted, correctly identifying an issue with our authentication provider but incorrectly stating that published projects were not affected. * **18:55:** The investigation broadens as reports of published content being unavailable are confirmed. The team identifies a second, simultaneous issue with our CDN provider, Cloudflare. The root cause is now understood to be a compound failure of two external services. The status page is updated to reflect the full scope of the impact. * **19:18:** With both root causes identified as external, our team's role shifts to active monitoring of Google Cloud's and Cloudflare's official status pages and APIs. We confirm that both providers have acknowledged major, ongoing incidents. * **20:20:** Google Cloud and Cloudflare report that their services have been restored. Our internal health checks and automated monitors confirm that both authentication and content delivery are fully operational. The incident is declared resolved. ‌ ### **4. Root Cause Analysis** The direct cause of this incident was the simultaneous failure of two independent and critical third-party services that our platform relies on: 1. **Authentication Service Failure:** An outage within **Google Cloud's Identity Platform** prevented the validation of user credentials. As our platform's authentication is fully managed by this service, no users could successfully log in or create new sessions. 2. **Content Delivery Network \(CDN\) Failure:** A widespread outage at **Cloudflare** disrupted their global network. Our Standard Hosting and embedded content are served exclusively through the Cloudflare CDN for performance and security. The CDN's failure made this content completely inaccessible. The combination of these two unrelated events created a full-platform outage. There were no internal code changes or infrastructure failures that contributed to the incident. ‌ ### **5. Resolution and Recovery** * The primary resolution path was to wait for Google and Cloudflare to resolve the incidents on their respective platforms. * Our engineering team actively monitored the status of both providers to provide timely updates and to verify service restoration as soon as it was announced. * Once both services were confirmed to be stable, our internal systems recovered automatically without needing manual intervention. ‌ ### **6. Lessons Learned & Action Items** This incident exposed our platform's vulnerability to concurrent, single-provider failures. While such events are rare, their impact is severe. Our corrective actions are focused on improving resilience and communication.