PlayFab incident

PlayFab Economy Catalog service degradation

PlayFab experienced a notice incident on August 12, 2025, lasting —. The incident has been resolved; the full update timeline is below.

Started: Aug 12, 2025, 05:52 PM UTC
Resolved: Aug 09, 2025, 04:00 AM UTC
Duration: —
Detected by Pingoru: Aug 12, 2025, 05:52 PM UTC

Update timeline

resolved Aug 12, 2025, 05:52 PM UTC

The PlayFab Catalog service experienced a drop in availability for about 5 hours. This was due to a misconfigured service. The issue has been resolved and the service is now back to fully operational.
postmortem Aug 19, 2025, 08:34 PM UTC

On August 8, 2025, between 10:55 PM and 1:15 AM PDT, some customers experienced errors and delays when searching or purchasing catalog items through PlayFab’s APIs. The incident was caused by a faulty service configuration intended for experimentation in a single region, which was mistakenly deployed to all regions due to human error. This misconfiguration led to increased internal server errors and service unavailability. The issue was resolved by rolling back the faulty configuration and confirming service recovery through monitoring and health checks. ### Impact Approximately 12,000 players were affected, unable to search or purchase catalog items. During the outage, the latency of Catalog APIs \(particularly SearchItems and GetItems\) increased significantly, and many requests were rejected with a “Service unavailable” status. ### Root Cause Analysis The root cause was a human error in deploying a service configuration intended for limited traffic in a single region. Instead, the configuration update was applied to all regions, increasing resource pressure and causing a spike in internal server errors for Catalog APIs dependent on storage. This led to widespread request failures and degraded service availability. ### Action Items * Rolled back the faulty service configuration to restore normal operation. * Improved deployment validation procedures to ensure experimental changes are restricted to intended regions. * Enhanced monitoring and alerting systems to detect and report anomalies in API error rates and service health more rapidly.