Kalix EMR incident

Loading issue

Critical Resolved View vendor source →

Kalix EMR experienced a critical incident on February 14, 2024 affecting Kalix Platform and Online Schedulers and 1 more component, lasting 13h 23m. The incident has been resolved; the full update timeline is below.

Started
Feb 14, 2024, 05:53 PM UTC
Resolved
Feb 15, 2024, 07:17 AM UTC
Duration
13h 23m
Detected by Pingoru
Feb 14, 2024, 05:53 PM UTC

Affected components

Kalix PlatformOnline SchedulersMessagingNotifications

Update timeline

  1. investigating Feb 14, 2024, 05:53 PM UTC

    Unfortunately, Kalix is experiencing time out error. We are currently investigating the problem. There will be updates ASAP. Telehealth is unaffected.

  2. investigating Feb 14, 2024, 05:55 PM UTC

    We are continuing to investigate this issue.

  3. monitoring Feb 14, 2024, 06:01 PM UTC

    We saw that there was heavy usage on some of our servers, and when this was reset the problem was solved. We are looking into why these servers had suddenly high usage.

  4. resolved Feb 15, 2024, 07:17 AM UTC

    After investigating this issue further and the incident two days ago, we think we may have identified an issue that caused continuous retries to load data which eventually overwhelmed the storage. We have made changes that should prevent this issue happening again, as we are using our newer database and caching as well to handle many more requests.