Dstny incident

Latency issues observed

Dstny experienced a major incident on December 1, 2025 affecting EU and EU, lasting 2d 5h. The incident has been resolved; the full update timeline is below.

Started: Dec 01, 2025, 09:48 AM UTC
Resolved: Dec 03, 2025, 03:16 PM UTC
Duration: 2d 5h
Detected by Pingoru: Dec 01, 2025, 09:48 AM UTC

Affected components

EUEU

Update timeline

investigating Dec 01, 2025, 09:48 AM UTC

We are currently investigating a potential major incident affecting ConnectMe in the EU West region, following internal alerts indicating possible latency within the system. Our teams are actively working to determine the scope of the issue, and we will provide updates every 60 minutes as more information becomes available. If your customers experience service issues this morning, please contact Support. Possible symptoms may include slow navigation within the ConnectMe client and login difficulties. Thank you for your patience while we work to resolve this matter. Dstny Support
investigating Dec 01, 2025, 10:38 AM UTC

We have confirmed that the impact relates to latency issues affecting Dstny Core services, including the ConnectMe application. Our teams are continuing to investigate and troubleshoot the underlying cause. At present, latency on the platform appears to have returned to normal levels. We will continue to monitor the situation closely and provide the next update in 1 hour. Thank you for your patience and cooperation. Dstny Support
investigating Dec 01, 2025, 12:23 PM UTC

Our teams are continuing to investigate and troubleshoot the underlying cause of the earlier latency issues affecting Dstny Core services, including the ConnectMe application. Latency continues to remain within normal levels at this time. We will provide the next update in 1 hour or sooner if there are any significant changes. Thank you for your patience and cooperation. Dstny Support
monitoring Dec 01, 2025, 01:40 PM UTC

With this in mind, we are moving the incident into a monitoring state for the next 24 hours while investigations continue. As noted, there is no ongoing risk to customers, and we believe the issue has been mitigated. We will provide further updates if anything changes during the monitoring period. Thank you for your patience and cooperation. Dstny Support
investigating Dec 01, 2025, 02:09 PM UTC

We have confirmed that the impact has returned, and our teams are actively investigating the issue. Further details will be shared as soon as they become available. Thank you for your patience and cooperation. Dstny Support
monitoring Dec 01, 2025, 03:33 PM UTC

Latency has now stabilised, and we continue to closely monitor the situation. As part of our mitigation plan, we will be moving the affected organisation to an alternate service node later this evening. This action is designed to prevent further impact and will not cause any disruption to service. Thank you for your patience and understanding. Dstny Support
monitoring Dec 02, 2025, 04:54 PM UTC

Latency remains stable, and we continue to monitor the situation closely. As part of our mitigation plan, the affected organisation was moved to an alternate service node yesterday evening. We are still investigating the incident in collaboration with our Platform team. While measures have been implemented to reduce the risk of recurrence, the root cause has not yet been confirmed. Therefore, the incident will remain in a monitoring state until the root cause is fully determined. We will provide the next update within 24 hours. Thank you for your continued patience and understanding.
monitoring Dec 03, 2025, 10:01 AM UTC

We have now identified the root cause of the incident, and mitigation actions have been implemented to prevent any immediate risk of recurrence. However, the major incident will remain in a monitoring state while our engineering teams diligently work on implementing a permanent solution. Further updates will be provided once a permanent fix has been confirmed. We apologise for any inconvenience caused and appreciate your continued patience during this time. Dstny Support
resolved Dec 03, 2025, 03:16 PM UTC

We are pleased to confirm that this major incident has now been fully resolved. Over the past 24 hours, we have closely monitored the situation and observed no recurrence or further impact. The root cause has been identified, and mitigation actions were implemented promptly to prevent any immediate risk of recurrence. Our engineering teams will deploy a permanent solution pending further investigation and development to ensure long-term stability and prevent future occurrences. To provide transparency, a detailed post-incident report will be made available within the next 5 business days. We sincerely apologise for any inconvenience caused and thank you for your patience and understanding throughout this incident. Should you have any further questions or concerns, please do not hesitate to contact our support team. Thank you, Dstny Support
postmortem Jan 07, 2026, 11:24 AM UTC

**Incident Summary** On 1st December 2025, between 08:55 and 21:00 UTC, users of ConnectMe and SMP services in the EU West region experienced significant latency, slow response times, degraded performance, and failed requests. Proactive monitoring detected high latency and responsiveness issues with widespread impact on all users, with authentication problems particularly affecting SMP. The incident was resolved at 21:00 UTC following account migration load-balancing activities carried out alongside the implementation of a permanent fix. **Root Cause** A latent software limitation within a Core system’s handling of large-scale bulk operations resulted in a database lock being held longer than expected. Under certain operational conditions, this lock reduced the system’s ability to process concurrent requests efficiently, leading to increased latency and degraded performance for ConnectMe and SMP services. Because the underlying constraint allowed one workload to temporarily consume disproportionate database resources, downstream systems experienced elevated contention. This created wider‑scale performance issues across multiple customers in the EU West region, beyond those directly associated with the originating Core environment. **Incident Resolution** Initial stabilisation was achieved through targeted failovers and controlled service restarts to reduce load and restore baseline performance. As part of the permanent remediation, the impacted workload was redistributed to an alternate node to eliminate immediate contention, and a hotfix was deployed to correct the underlying software limitation. These combined actions fully restored normal service operation and addressed the root cause to prevent recurrence. **Mitigative Actions** Optimising the Core system to better handle large-scale bulk operations and prevent prolonged database locks. Reviewing and improving bulk update processes to strengthen system resilience under heavy load. Reduce the risk of latency in a single Core system causing wider impact by implementing additional network hardening measures across affected services.