PlayFab incident
PlayFab Matchmaking APIs is experiencing degraded experience
PlayFab experienced a major incident on August 31, 2025 affecting Matchmaking, lasting 8h 28m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Aug 31, 2025, 07:56 PM UTC
PlayFab Matchmaking APIs are experiencing degraded performance. Approximately 50% of calls to the Matchmaking APIs are returning 500 internal server error. The team is currently investigating the issue.
- investigating Aug 31, 2025, 07:57 PM UTC
We are continuing to investigate this issue.
- identified Sep 01, 2025, 01:18 AM UTC
The matchmaking feature is experiencing a major degradation. Titles utilizing real-time notifications experienced very low success rates depending upon their scenarios. A mitigation is in place and we are monitoring recovery.
- monitoring Sep 01, 2025, 02:53 AM UTC
A fix has been implemented and we are monitoring the results.
- resolved Sep 01, 2025, 03:28 AM UTC
This incident has been resolved.
- postmortem Sep 09, 2025, 11:34 PM UTC
On August 31, 2025, between 11:50 AM and 7:53 PM PDT, some customers experienced a complete outage of PlayFab Matchmaking. The incident was caused by a regression in the authentication library, which led to unexpected high call rates to Entra ID for token acquisition, resulting in service throttling and failures. We resolved the issue by increasing authentication rate limits, updating the authentication code, recycling affected application instances, and isolating impacted titles into dedicated deployments. ### Impact During the outage, matchmaking services were unavailable for all affected titles, preventing players from joining matches. Notification failures also occurred, resulting in players being stuck in matchmaking and not receiving proper updates. ### Root Cause Analysis The root cause of the incident was a regression in the updated authentication library, which failed to cache tokens as expected. This led to a surge in token requests to Entra ID, causing throttling and authentication failures across the service. The updated library created new instances for each authentication attempt, breaking the caching mechanism and increasing network traffic unexpectedly. ### Action Items To prevent similar incidents from happening again, we have taken the following actions: * Implemented service-level caching for authentication tokens to reduce unnecessary external calls. * Enhanced monitoring and diagnostics to detect abnormal network patterns and authentication failures more quickly. * Improved notification handling and recycling processes to prevent players from being stuck in matchmaking.