Kontent.ai experienced a critical incident on February 19, 2024 affecting Application and Management REST API, lasting 7d 17h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Feb 19, 2024, 04:55 PM UTC
Several users have reported an inability to log in or access Kontent.ai. Currently we believe the issue is isolated to projects located in the US data center and our development team is actively investigating. Thank you for your patience while we work to resolve this issue.
- monitoring Feb 19, 2024, 05:19 PM UTC
Our service should now be back online after about 40 minutes of downtime. Delivery API functionality was not affected. We are still investigating the initial cause of the outage.
- identified Feb 19, 2024, 05:20 PM UTC
Our service should now be back online after about 40 minutes of downtime. Delivery API functionality was not affected. We are still investigating the initial cause of the outage.
- identified Feb 19, 2024, 05:41 PM UTC
Some users are still unable to access Kontent.ai, our team is continuing to investigate and work towards a solution. Thank you for your patience.
- identified Feb 19, 2024, 05:56 PM UTC
Some users are still unable to access Kontent.ai, and some number of Management API requests are also failing as a result of this issue. Our team is continuing to investigate and work towards a solution. Thank you for your patience.
- identified Feb 19, 2024, 08:27 PM UTC
Content Production (both via user interface and MAPI) performance for projects in the US data center is still severely degraded. Some number of requests fail, but most are eventually successful after some delay. We believe the issue is due to abnormally high memory usage, and we are continuing to investigate the cause. Thank you for your patience while we continue working toward a solution.
- identified Feb 20, 2024, 01:05 AM UTC
Our developers have identified recent infrastructure changes, leading to increased resource consumption, as the probable cause. We are working to mitigate this issue.
- identified Feb 20, 2024, 11:07 AM UTC
We are now rolling back changes we made for US datacenter that caused this issue. Hopefully, this should be fixed soon and we'll keep you updated.
- identified Feb 20, 2024, 01:18 PM UTC
We have applied a fix for the US data center, and projects are continuously returning to normal. It should take a maximum of a few hours to get everything sorted. If you still encounter any issues with your US projects, please let us know.
- monitoring Feb 20, 2024, 01:55 PM UTC
The fixing process has been finished and all projects in the US data center now seem to be fully operational.
- resolved Feb 27, 2024, 10:35 AM UTC
After a 7-day monitoring period, no issues were detected with the implemented solution. The performance challenges experienced in our US data center were caused by a bug within Microsoft's Azure Cosmos DB SDK, which is currently a top priority for resolution on their side. Kontent.ai has successfully applied a workaround guaranteeing complete functionality until Microsoft addresses the underlying issue.