Molecule experienced a critical incident on April 20, 2021 affecting Molecule, lasting 4h 18m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- identified Apr 20, 2021, 04:01 PM UTC
Molecule's authorization provider is currently down, resulting in the inability to login to the Molecule application. We are monitoring the situation with highest priority.
- identified Apr 20, 2021, 06:02 PM UTC
Due to the major Auth0 outage, many users are unable to login to the Molecule web application and web API. Users who were logged into the web application before the outage will remain logged in and will continue to be able to access Molecule. We are exploring and testing our fallback login scheme now and will consider that option depending on the length of the outage. We apologize for the inconvenience and will continue to update as often as we have information. You can track Auth0's status updates here: https://status.auth0.com/incidents/zvjzyc7912g5
- identified Apr 20, 2021, 08:10 PM UTC
Molecule is continuing to monitor the issue with our authorization provider, Auth0. While the root cause has not been determined yet, the source of the issue has been identified by Auth0. Users with an existing login session open with Molecule are still able to access the Molecule web application. As progress continues to be made towards final resolution, some users will now able to log in without issue to Molecule. We apologize for the inconvenience and will continue to update as often as we have information. You can track Auth0's status updates here: https://status.auth0.com/incidents/zvjzyc7912g5
- resolved Apr 20, 2021, 08:57 PM UTC
Although Auth0 is still reporting degraded performance, the Molecule application is now accessible to users. We will be continuing to monitor Auth0 status updates and report back any new updates. If you continue to experience any difficulty in accessing Molecule, please reach out to our support team at [email protected] for further assistance.
- postmortem Apr 21, 2021, 06:41 PM UTC
Yesterday, we had an outage that affected 1\) users who weren't already logged in for the day, and 2\) users who were using Excel to access Molecule. It lasted about 4 hours, and that's close to our total downtime on an average year. We're sorry. Here's what happened. ### In Summary Molecule \(and a fair bit of the rest of the Internet\) uses a service called Auth0 \([https://www.auth0.com](https://www.auth0.com)\) to make our users' logins more secure. It went down for the first time we've seen, for 4 hours, yesterday. Molecule users who weren't already logged in, were not able to do so. Excel/API users on our old \(v1\) API were also not able to log in. We had a fallback, but the threshold we set internally to deploy it, wasn't reached before Auth0 came back up. \(We have a threshold, because logins are very much tied to security, which is our #1 priority\). ### What happened 1. In early 2020, after much testing, we switched our login system to a company called Auth0. By doing so, we were able to provide Okta/SAML single sign-on and two-factor authentication; security features we had long been wanting to add to our system. We also chose Auth0 \(as did many others\) because we felt their specialization on security would provide you even better protection than we already had. 2. Auth0 went down for about 4 hours yesterday; even their redundant \(high-availability\) systems failed. 3. Auth0's uptime characteristics \(prior to yesterday\) were quite good, but when we switched, we knew we were taking an uptime risk -- so we kept our previous authentication/login system as a fallback, just in case something like this ever happened. 4. Yesterday, during the outage, we tested our fallback plan, balancing what's most important \(security\), with uptime, and the knowledge that flipping the switch would come with its own disruption -- the need for all users to change passwords, and then change them again when we switched back. 5. At the time of the outage, much of the day's expected trading volume and time-sensitive usage had completed \(and processing of trades, positions, etc., was continuing as usual in the background\), so we made the decision to be extra-careful -- and only switch to our fallback if the outage lasted more than a few hours; definitely for the upcoming business day. 6. Auth0 came back online during the window we set. ### What's next In the coming weeks, we will be re-evaluating our tie to Auth0 as a critical system \(they are one of the 5 critical-for-availability external systems listed in our company's risk register\). At first glance, we still believe in the thesis of the decision we made, to choose them. They are the best at what they do, and they recently announced an acquisition by an even larger security-focused company \(Okta\). The most likely place for us to evaluate changes, will be in our fallback plan. We will look at how to have it \(or another fallback\) ready to go safely, faster -- knowing that comes with its own tradeoffs. We will also look at more aggressively retiring the old v1 API and migrating users over to the v2 \(which is protected from this type of risk\). Thank you for your patience, and we're very sorry this happened. Please let us know if you have questions.