mastodon.social incident

Degrated performance

Minor Resolved View vendor source →

mastodon.social experienced a minor incident on October 26, 2023, lasting —. The incident has been resolved; the full update timeline is below.

Started
Oct 26, 2023, 03:19 PM UTC
Resolved
Oct 26, 2023, 03:19 PM UTC
Duration
Detected by Pingoru
Oct 26, 2023, 03:19 PM UTC

Update timeline

  1. resolved Oct 26, 2023, 03:19 PM UTC

    Type: Incident Duration: 15 days, 23 hours and 33 minutes Affected Components: Background queues, Website & API Oct 27, 10:56:45 GMT+0 - Identified - The situation is now stable and services are working, but the performance is not back to the previous levels. The problem seems to be tied to the kernel version we are running, which may not have full support for the new hardware we deployed. We are working on updating our systems to a newer OS version and tuning the various system settings to ensure everything goes back to expected levels of performance. Dec 12, 14:52:41 GMT+0 - Resolved - Performance has improved after upgrading various OS packages, and upgrading kubernetes to more recent versions with performance improvements. Additional work is still ongoing to identify other areas where performance can be improved, but current performance is now better than it was. Oct 26, 15:19:35 GMT+0 - Investigating - There appears to be degraded performance after a move away from deprecated Hetzner nodes types. Currently investigating. Oct 26, 18:54:43 GMT+0 - Investigating - Definite cause of the degradation has yet to be identified, however moving to more powerful nodes for the time being seems to have mitigated the issue. Sidekiq queues are still behind, but are no longer growing, and additional workers have been deployed to help it catch up. A ticket has been submitted with Hetzner to see if this could be an issue on their side. Waiting to hear back.