Mural incident

Users are unable to sign In to MURAL

Mural experienced a major incident on September 28, 2022 affecting Authentication and Website and 1 more component, lasting 1h 40m. The incident has been resolved; the full update timeline is below.

Started: Sep 28, 2022, 06:49 AM UTC
Resolved: Sep 28, 2022, 08:30 AM UTC
Duration: 1h 40m
Detected by Pingoru: Sep 28, 2022, 06:49 AM UTC

Affected components

AuthenticationWebsiteRealtime collaboration

Update timeline

investigating Sep 28, 2022, 06:49 AM UTC

Users are currently unable to sign in to MURAL. We know this is a major service disruption for everyone. We're investigating the issue and will restore regular service ASAP. Please check our status page for the most up-to-date info 👉 status.mural.co/
identified Sep 28, 2022, 07:21 AM UTC

The issue has been identified and a fix is being implemented.
monitoring Sep 28, 2022, 07:29 AM UTC

A fix has been implemented and we are monitoring the results.
resolved Sep 28, 2022, 08:30 AM UTC

This incident has been resolved.
postmortem Oct 05, 2022, 12:49 PM UTC

**What happened?** At 06:25 UTC on September 28 2022 we started to receive notifications of users seeing an error page when accessing their MURAL workspaces. Initial investigations indicated the cause was an issue with our cloud provider. Shortly after we received confirmation from our cloud provider that routine server maintenance had had an unexpected impact on the availability of data hosted in one of their facilities. At 06:55 UTC MURAL implemented a workaround to re-route traffic and restore service via a secondary server. At 07:25 UTC our cloud provider confirmed that full service had been restored on the primary server and we were able to revert the earlier change. ‌ **Summary** This incident impacted service availability from 06:25 UTC to 06:55 UTC, for a total of 30 minutes of downtime. No data was lost, however MURAL was not accessible during this time. ‌ **What we've done to avoid this happening again** We have improved automated monitoring and notifications to include this particular scenario, to ensure we can restore service faster in the event this should occur again.