Squiz incident

DXP 6.61.0 Safe Edit Issue causing error 500 for Public users

Major Resolved View vendor source →

Squiz experienced a major incident on April 2, 2025 affecting Squiz SaaS Hosted Instances, lasting 8h 15m. The incident has been resolved; the full update timeline is below.

Started
Apr 02, 2025, 05:25 AM UTC
Resolved
Apr 02, 2025, 01:40 PM UTC
Duration
8h 15m
Detected by Pingoru
Apr 02, 2025, 05:25 AM UTC

Affected components

Squiz SaaS Hosted Instances

Update timeline

  1. investigating Apr 02, 2025, 05:25 AM UTC

    Squiz has been made aware of an error affecting customers on the latest version of DXP has caused an error when assets have been placed into safe edit where it will cause an error 500 on trying to access the asset in question. Squiz is working on Rolling back affected clients now.

  2. investigating Apr 02, 2025, 05:25 AM UTC

    We are continuing to investigate this issue.

  3. identified Apr 02, 2025, 05:38 AM UTC

    Squiz are continuing to work on a Fix for this issue and will implement a fix as soon as we have positive test results.

  4. identified Apr 02, 2025, 05:53 AM UTC

    We are now implementing a Rollback for the affected version of DXP. We will have another update ready once the operation is complete.

  5. monitoring Apr 02, 2025, 06:18 AM UTC

    We have Begun rolling back clients with the affected version of DXP and have begun monitoring them to see if there are any further issues.

  6. monitoring Apr 02, 2025, 06:46 AM UTC

    We are continuing to work on rolling back the changes and are continuing to monitor the situation.

  7. monitoring Apr 02, 2025, 07:18 AM UTC

    We are continuing to work on the rollback to resolve the issues.

  8. monitoring Apr 02, 2025, 07:36 AM UTC

    Deployment of the rollback is now underway. We will continue to monitor for any issues during the deployment.

  9. monitoring Apr 02, 2025, 08:07 AM UTC

    Rollback of the changes continues, and we are now seeing recovery for some affected customers. We are actively checking reported outages for recovery and are continuing to monitor for developments.

  10. monitoring Apr 02, 2025, 09:10 AM UTC

    We are continuing to progress with the rollback of the changes. We are continuing to see recovery for some affected customers. We are actively checking reported outages for recovery and are continuing to monitor for developments.

  11. monitoring Apr 02, 2025, 10:58 AM UTC

    We are continuing to see recovery for more affected customers and believe we are nearing resolution. We are continuing to be vigilant for any further issues.

  12. monitoring Apr 02, 2025, 12:26 PM UTC

    We are continuing to monitor for any outstanding issues. A further update will be provided once the rollback has been completed.

  13. resolved Apr 02, 2025, 01:40 PM UTC

    Dear Customers, Following an extended period of monitoring, we are pleased to confirm that this issue has now been resolved. We appreciate your patience and understanding during this time and apologise for any inconvenience caused. A post mortem will be made available on https://status.squiz.cloud/ in the coming days.

  14. postmortem Apr 10, 2025, 05:34 AM UTC

    **Summary** On the 2nd April at approximately 14:49 \(GMT\+10\) Squiz received indications of 500 Errors from customer pages. Squiz’s Support teams alongside our product team, quickly identified that the issue was induced by a recent DXP Upgrade version 6.61.0. Hot patches were released whilst in parallel Matrix DXP was rolled back to 6.60.1 restoring services. **Customer Impact** Incident Duration: 02 Apr 2025, 14:49 - 23:14 \(GMT\+10\) Impact: some customers experienced site page 500 errors. Impact times and service restoration times varied throughout the course of the incident duration. The effect of this issue was limited to clients who changed asset statuses during a specific period of time which meant that impact was only felt by some users who would have been editing assets at the time of the incident. **Root cause Analysis** An asset property was removed in Matrix Version 6.61.0. This impacted assets that were placed into Safe Edit as it resulted in errors when Matrix attempted to serialise objects. Resolution Actions 1. Identification Squiz Support Team identified a trend in logs when investigating reports of problems. Product teams were engaged with and quickly isolated the cause. 2. Hot Patch Squiz developed, tested and deployed hot-patches, whilst in parallel assessing version Rollback vs Roll forward. 3. Downgrade To fully resolve the issue a Matrix Version downgrade took place. **Follow-up Actions** Squiz has deployed monitoring enhancements to have the ability to detect/monitor for similar events including identification during testing - completed Squiz has rolled out Matrix Version 6.61.1 successfully, which introduced a change to circumvent this issue. - completed