Cadmium experienced a minor incident on April 26, 2024 affecting Website, lasting 2d 20h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Apr 26, 2024, 03:46 PM UTC
Our team is aware there is a service disruption with eventScribe website for some users. We are working swiftly to identify the root cause.
- monitoring Apr 26, 2024, 04:26 PM UTC
A fix has been put in place, and service has been restored. Our team is continuing to monitor the system. If you are still experiencing issues, please notify your project manager.
- resolved Apr 29, 2024, 12:38 PM UTC
The issue has been resolved.
- postmortem Apr 30, 2024, 12:59 PM UTC
This incident occurred when an influx of usage in the [eventScribe.net](http://eventScribe.net) server pool created a CPU spike and the [eventScribe.net](http://eventScribe.net) servers scaled up more quickly than planned. Typically, a new server is put into service from the standby “warm pool,” which contains servers ready for use. In this case, CPU levels increased across the cluster of servers faster than could be supported by servers in the warm pool. As a result, servers not ready were put into service, resulting in errors. Once the servers became ready the issue was resolved. We are currently working to implement changes to prevent this from happening.