SaaS AI Gateway recovered
Timeline · 1 update
- resolved May 08, 2026, 10:29 AM UTC
SaaS AI Gateway recovered
PortKey had 12 outages in the last 2 years totaling 32h 20m of downtime — averaging 0.5 incidents per month.
There were 12 PortKey outages since May 24, 2024 totaling 32h 20m of downtime. Each is summarised below — incident details, duration, and resolution information.
SaaS AI Gateway recovered
An upstream code change impacted the budget functonality for a few minutes.
An upstream code change caused a brief degradation in Portkey’s budget-related API functionality. During this window, requests depending on budget evaluation or enforcement may have experienced unexpected behavior. Core API routing remained operational, and Logs were not affected. The change has been identified and mitigated. We are continuing to monitor API behavior to ensure full stability.
The brief degradation affecting budget-related API functionality has been resolved. The issue was caused by an upstream code change and affected requests relying on budget evaluation or enforcement. Core API routing remained available throughout, and Logs were not affected. We have mitigated the change, verified recovery, and will continue monitoring to ensure stability.
We had a brief authentication issue on the control plane today. What happened: Between 9:50 AM - 10:45 AM UTC users who had signed out couldn't sign back in using email + password. New signups were also affected during this window. SSO and other auth methods continued to work normally.
Current status: Fully resolved. Email/password auth and signups are working as expected.
We experienced degraded performance affecting the Portkey Dashboard and Prompt Render API between 2:23 AM - 3:38 AM IST on January 22, 2026. Impact: Dashboard timeouts and slow loading for some users. The Prompt Render API experienced intermittent failures. What is NOT affected: The AI Gateway remained fully operational throughout. All inference requests and model calls continued to work normally with no failures.
Dashboard and /prompt/render APIs are up now. Root cause: An unusually high volume of SCIM provisioning updates created unexpected load on our control plane database. Resolution: The issue has been resolved and all services are operating normally. We are implementing additional rate limiting and scaling measures to prevent similar occurrences.
A code update introduced a new metric that occasionally resulted in failed Clickhouse inserts, causing logs to not appear in the UI.
Logs are working as expected now.
Portkey is impacted by the global Cloudflare outage. SaaS Gateway and the control plane are facing intermittent failures. We are actively investigating the issue and working to get the system back up.
Service is back up now.
While working to fix some caching issues yesterday, we inadvertently introduced a bug that caused erroneous cache collisions. This issue may have affected your requests that had cache enabled. If you were using different metadata or cache namespaces, your requests were properly isolated and unaffected from this bug. Please Note: - The collisions did not happen between cache partitions - The collisions did not happen across accounts The issue has been fully reverted as of Jul 23, 2025 3:23 UTC. Our team is actively monitoring the system to ensure stability. We are sorry for this!
We wish it was April 1st, but the internet is just not feeling it today. We're experiencing an outage on the UI due to an ongoing outage with GCE and Firebase. The AI gateway requests are also experiencing increased latencies due to cloudflare outages. We're continuously monitoring our services and working to get them back soon.
Latencies have reduced significantly and services seem up. We're continuing to monitor services and requests. Please reach out on Discord or [email protected] if you're facing issues.
API went down.
API recovered.
API went down.
API recovered.
We deployed an authorization layer update to Portkey which caused our APIs to return 401 to a majority requests. The service came back after 2 minutes of downtime.
Writes to our metrics cluster are delayed and the logs screen is intermittently failing. We're investigating the issue.
We have fixed the logs for now and are looking at the dropped logs. We'll replay them within the next 4 hours.