Event Ingestion (EU) recovered
Timeline · 1 update
- investigating Jul 04, 2026, 10:47 AM UTC
Event Ingestion (EU) went down
Userpilot had 13 outages in the last 2 years totaling 12h 15m of downtime — averaging 0.5 incidents per month.
There were 13 Userpilot outages since April 23, 2025 totaling 12h 15m of downtime. Each is summarised below — incident details, duration, and resolution information.
Event Ingestion (EU) went down
We are currently experiencing an issue affecting our EU hosting region. Our team is actively working on a fix and will share updates as they become available.
The issue has been resolved, and everything is working normally now. Thank you for your patience.
We are currently experiencing degraded performance affecting data ingestion. Some customers may notice delays in incoming data. Our engineering team is actively investigating and working to restore full performance as quickly as possible. Thank you for your patience.
We are currently experiencing degraded performance affecting data ingestion. Some customers may notice delays in incoming data. Our engineering team is actively investigating and working to restore full performance as quickly as possible. Thank you for your patience.
A fix has been implemented and deployed, and we are currently monitoring system performance to ensure stability. We will share the postmortem for this incident as a follow-up message. Thank you for your patience.
Postmortem The incident was caused by our database background optimization process not running aggressively enough. This allowed data fragments to accumulate across tables used for real-time cache loading. As the number of fragments increased, queries that normally completed in milliseconds were forced to scan across many more fragments than necessary. This significantly increased query latency and led to exhaustion of available database connections, which impacted data ingestion and content publishing performance. To resolve this, we reconfigured our database to optimize these tables more frequently and in smaller batches. This keeps fragment counts low and ensures real-time queries remain fast and stable. Additionally, we have added monitoring and alerting on fragment count and size per table so we can detect abnormal accumulation early and prevent similar incidents in the future.
We are currently experiencing a delay in data insertion affecting our ingestion pipeline.
The issue causing delays in data ingestion has been fully resolved. Ingestion throughput has returned to normal operating levels, and all data is being processed as expected. There was no data loss during the incident. We will publish a post-mortem with additional details and preventive actions in a follow-up communication. Thank you for your patience and understanding.
We are currently experiencing a service degradation due to an ongoing outage at our CDN provider, Cloudflare. This may affect both the loading of our web application and the download of our JavaScript SDK. Impact: Some customers may be unable to load the web app or may experience very slow load times. Some customers may experience failures or timeouts when downloading our SDK. Impact may vary by region and ISP depending on Cloudflare’s edge availability. What’s happening: Our assets (including the SDK and static web resources) are served through Cloudflare. As Cloudflare is currently having issues, some requests to these assets are failing or timing out.
We have switched CDN providers temporarily. Our services are no longer affected by Cloudflare’s outage and are functioning normally. Thank you for your patience and understanding.
We have switched CDN providers temporarily. Our services are no longer affected by Cloudflare’s outage and are functioning normally. Thank you for your patience and understanding.
Javascript SDK recovered
Web Application & Chrome Extension APIs recovered
Web Application recovered
Userpilot Website recovered
Event Ingestion (US - Standard) recovered
Our SSO service is impacted by the AWS incident in US-EAST-1. We are currently investigating options for restoring the SSO service despite the AWS incident.
SSO Service has recovered. Our team continues to monitor the situation with our cloud provider.
We are experiencing service degradation. Our team is actively monitoring and working to restore full performance.
We’ve taken steps to address the service degradation and are closely monitoring performance. Post Mortem What happened: After upgrading our columnar database, an incompatibility stopped automatic consolidation on a core table. Impact: Partial data loss and content triggering issues. Cause: The new version didn’t work correctly with tables that use pre-computed summaries, leading to too many small data fragments. Fix: Rebuilt the summaries to force consolidation and rolled back to the prior database version. Performance recovered. Prevention: Added an alert on fragment count, paused similar upgrades, and expanded compatibility/performance testing before future releases. We’re sorry for the disruption and appreciate your patience.
We’ve taken steps to address the service degradation and are closely monitoring performance. ## Post Mortem **What happened**: After upgrading our columnar database, an incompatibility stopped automatic consolidation on a core table. **Impact**: Partial data loss and content triggering issues. **Cause**: The new version didn’t work correctly with tables that use pre-computed summaries, leading to too many small data fragments. **Fix**: Rebuilt the summaries to force consolidation and rolled back to the prior database version. Performance recovered. **Prevention**: Added an alert on fragment count, paused similar upgrades, and expanded compatibility/performance testing before future releases. We’re sorry for the disruption and appreciate your patience.