SchemeServe experienced a minor incident on October 5, 2023 affecting 🎩 SchemeServe, lasting 12m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 05, 2023, 09:35 AM UTC
We are investigating reports that some users are recieving error messages when logging in and are investigating this as a matter of urgency
- resolved Oct 05, 2023, 09:47 AM UTC
This incident has been resolved.
- postmortem Oct 05, 2023, 10:23 AM UTC
SchemeServe uses the Kubernetes framework to auto-scale to meet the demand of any specific time, Sometimes during this scaling process some additional services created fail and another attempt is made to deploy the specific service. However this morning due to a significant increase in the number of failures the allocatable space for new services reached a limit of available IP addresses was reached. To mitigate this the service will auto-scale additional server resources ahead of time however in this instance this didn’t happen fast enough due to the wave of errors and caused a backlog. Once the additional server resource for the services to be deployed to was available and we had cleared a number of failed services full service was resumed again. We will be updating this Kubernetes auto-scale framework to avoid this from happening again.