Molecule incident

(EU) Certain Market Data not Updating

Major Resolved View vendor source →

Molecule experienced a major incident on October 23, 2024 affecting Molecule Europe, lasting 5h 11m. The incident has been resolved; the full update timeline is below.

Started
Oct 23, 2024, 12:46 PM UTC
Resolved
Oct 23, 2024, 05:57 PM UTC
Duration
5h 11m
Detected by Pingoru
Oct 23, 2024, 12:46 PM UTC

Affected components

Molecule Europe

Update timeline

  1. investigating Oct 23, 2024, 12:46 PM UTC

    We are aware of an issue related to `user` market data updating correctly. We are investigating as a priority.

  2. resolved Oct 23, 2024, 05:57 PM UTC

    We have identified the root cause of this issue and have successfully updated and validated the impacted marks. The affected trades have been revalued. We will share details in a post-mortem soon. Thank you for your patience and understanding during this process.

  3. postmortem Oct 29, 2024, 01:26 AM UTC

    **Summary of Incident** On October 23, 2024, a system issue impacted the processing of specific background jobs in our European production environment. We apologize for any disruption this may have caused and thank you for your patience. **What happened?** After a hotfix deployment to address a prior incident, the workers responsible for processing job queues in our European production environment failed to resume operations, leading to delays and inaccuracies in market data updates. **Why?** The issue was due to a deployment anomaly. Background job processors did not automatically scale back up after a system update, a step typically handled by our automated scaling process. This error was isolated to the European environment. **Corrective/Preventive Actions** To mitigate this issue and prevent recurrence, we are implementing the following actions: * We are conducting further analysis to determine the precise cause of the workers' scaling error. * Additional monitoring and alerts will be implemented to verify worker scaling after each deployment. * We will review our deployment playbook to ensure confirmation of worker functionality post-deployment and ensure consistency across all environments. We sincerely appreciate your patience as we work to strengthen our systems' resilience and prevent similar disruptions in the future.