SchemeServe incident

Some users reporting access issues

Minor Resolved View vendor source →

SchemeServe experienced a minor incident on October 5, 2023 affecting 🎩 SchemeServe, lasting 12m. The incident has been resolved; the full update timeline is below.

Started
Oct 05, 2023, 09:35 AM UTC
Resolved
Oct 05, 2023, 09:47 AM UTC
Duration
12m
Detected by Pingoru
Oct 05, 2023, 09:35 AM UTC

Affected components

🎩 SchemeServe

Update timeline

  1. investigating Oct 05, 2023, 09:35 AM UTC

    We are investigating reports that some users are recieving error messages when logging in and are investigating this as a matter of urgency

  2. resolved Oct 05, 2023, 09:47 AM UTC

    This incident has been resolved.

  3. postmortem Oct 05, 2023, 10:23 AM UTC

    SchemeServe uses the Kubernetes framework to auto-scale to meet the demand of any specific time, Sometimes during this scaling process some additional services created fail and another attempt is made to deploy the specific service. However this morning due to a significant increase in the number of failures the allocatable space for new services reached a limit of available IP addresses was reached. To mitigate this the service will auto-scale additional server resources ahead of time however in this instance this didn’t happen fast enough due to the wave of errors and caused a backlog. Once the additional server resource for the services to be deployed to was available and we had cleared a number of failed services full service was resumed again. ‌ We will be updating this Kubernetes auto-scale framework to avoid this from happening again.