Reports of 504 errors and slow load times
Timeline · 8 updates
- investigating Apr 27, 2026, 02:22 PM UTC
We are currently investigating reports of 504 errors and slow load times.
- identified Apr 27, 2026, 02:39 PM UTC
We are temporarily putting KnowledgeOwl into maintenance mode in order to let the system recover. We will be bringing everything back on line as quickly as possible.
- identified Apr 27, 2026, 02:48 PM UTC
All services are currently in maintenance mode. This includes knowledge bases, the application, files, and the API.
- identified Apr 27, 2026, 03:46 PM UTC
We are still in emergency maintenance mode as we sort out the underlying issues that were causing the 504 errors. We are so sorry and are trying to get you all back online as quickly as possible. As one of our customers said, it is the Mondayest Monday.
- identified Apr 27, 2026, 04:48 PM UTC
Our team is actively investigating and has attempted several mitigations, none of which have resolved the issue. We'll continue posting hourly updates, or sooner if the situation changes.
- monitoring Apr 27, 2026, 05:49 PM UTC
A fix has been implemented and we are monitoring the results. All systems should be back online. We will continue to monitor over the next hour before marking this as resolved, and we will share a postmortem once our review is complete.
- resolved Apr 27, 2026, 06:52 PM UTC
The fix is holding and this incident is now resolved. We'll share a post-mortem later this week. Thanks for your patience today.
- postmortem Apr 27, 2026, 07:22 PM UTC
**What happened** We successfully completed a critical transition away from a legacy dependency; however, the deployment introduced a dormant performance regression. This issue remained hidden during low-traffic periods, only impacting database health once subjected to the full weight of production loads. **How we resolved it** Our monitoring systems detected the issue and alerted our team. Our engineering team placed all services into maintenance mode during investigation and troubleshooting. Once a successful fix was in place, all services were brought back online. **What we're doing to prevent this** We are evaluating improvements to our testing process to better simulate real-world traffic conditions before deploying infrastructure changes. We apologize for the downtime and appreciate everyone’s support, kind words, and patience during and after the incident.