Kuali incident

Platform/Build intermittent issues

Minor Resolved View vendor source →

Kuali experienced a minor incident on August 14, 2025 affecting United States (Oregon) and Canada (Montreal), lasting 1h 13m. The incident has been resolved; the full update timeline is below.

Started
Aug 14, 2025, 08:44 PM UTC
Resolved
Aug 14, 2025, 09:57 PM UTC
Duration
1h 13m
Detected by Pingoru
Aug 14, 2025, 08:44 PM UTC

Affected components

United States (Oregon)Canada (Montreal)

Update timeline

  1. investigating Aug 14, 2025, 08:44 PM UTC

    We are receiving reports of intermittent issues loading Kuali Build and Kuali Next-Gen products. Our engineers are currently investigating and we will keep you posted on our progress. We apologize for the inconvenience this may be causing.

  2. monitoring Aug 14, 2025, 09:02 PM UTC

    Kuali Build and Kuali Next-Gen products should now be loading and our engineers are continuing to monitor performance. We'll keep you posted as we have more information.

  3. resolved Aug 14, 2025, 09:57 PM UTC

    We wanted to provide an update on the recent errors in the Build/Platform system. These were caused by two separate issues that happened concurrently: 1. An outage with our third-party provider, Fly.io (more info can be found at https://status.flyio.net) 2. A database issue that reduced our system’s capacity We restarted our fly.io instances to restore operations and cycled our Mongo database clusters to alleviate the database issue. The system is now fully operational, and the database is stable. We’re continuing to work with our providers to fully understand what led to the outages and are continue to monitor performance closely to ensure everything runs smoothly. We apologize for the inconvenience and we appreciate your patience while we resolved these issues.

  4. postmortem Aug 14, 2025, 09:57 PM UTC

    We discovered long running queries in our codebase that was causing our database to get into a bad state. We’ve improved the query performance by adding additional indexes so results are returned much more efficiently. We’ve also streamlined our process when adding new queries to maintain performance in the future. Lastly, we’re also looking into long term infrastructure improvements that can better distribute performance spikes. We apologize for the inconvenience and we appreciate your patience while we resolved these issues.