Chef experienced a critical incident on April 16, 2020, lasting 3h 31m. The incident has been resolved; the full update timeline is below.
Update timeline
- investigating Apr 16, 2020, 11:06 PM UTC
After upgrading PostgreSQL we are seeing database connections and CPU spikes. We're resizing the database to get more system resources and will provide additional updates as we have them.
- investigating Apr 17, 2020, 12:00 AM UTC
We're investigating an unexpected elevation in fetches by the authz service from the database.
- investigating Apr 17, 2020, 01:58 AM UTC
We're isolating the issue with authz service's database queries that is taking an abnormally long time to complete.
- investigating Apr 17, 2020, 02:13 AM UTC
We've implemented a short term workaround to restore service. We're monitoring the service.
- investigating Apr 17, 2020, 02:24 AM UTC
Traffic patterns and service have normalized to regular levels observed prior to the maintenance window. We will conduct an incident analysis and write up a blog post for this next week. Thank you for your patience and I'm sorry that this impacted your workflows.
- resolved Apr 17, 2020, 02:37 AM UTC
This incident has been resolved.