PlayFab incident

PlayFab Economy Catalog service degradation

Notice Resolved View vendor source →

PlayFab experienced a notice incident on August 12, 2025, lasting —. The incident has been resolved; the full update timeline is below.

Started
Aug 12, 2025, 05:52 PM UTC
Resolved
Aug 09, 2025, 04:00 AM UTC
Duration
Detected by Pingoru
Aug 12, 2025, 05:52 PM UTC

Update timeline

  1. resolved Aug 12, 2025, 05:52 PM UTC

    The PlayFab Catalog service experienced a drop in availability for about 5 hours. This was due to a misconfigured service. The issue has been resolved and the service is now back to fully operational.

  2. postmortem Aug 19, 2025, 08:34 PM UTC

    On August 8, 2025, between 10:55 PM and 1:15 AM PDT, some customers experienced errors and delays when searching or purchasing catalog items through PlayFab’s APIs. The incident was caused by a faulty service configuration intended for experimentation in a single region, which was mistakenly deployed to all regions due to human error. This misconfiguration led to increased internal server errors and service unavailability. The issue was resolved by rolling back the faulty configuration and confirming service recovery through monitoring and health checks. ### Impact Approximately 12,000 players were affected, unable to search or purchase catalog items. During the outage, the latency of Catalog APIs \(particularly SearchItems and GetItems\) increased significantly, and many requests were rejected with a “Service unavailable” status. ### Root Cause Analysis The root cause was a human error in deploying a service configuration intended for limited traffic in a single region. Instead, the configuration update was applied to all regions, increasing resource pressure and causing a spike in internal server errors for Catalog APIs dependent on storage. This led to widespread request failures and degraded service availability. ### Action Items * Rolled back the faulty service configuration to restore normal operation. * Improved deployment validation procedures to ensure experimental changes are restricted to intended regions. * Enhanced monitoring and alerting systems to detect and report anomalies in API error rates and service health more rapidly.