AskCody incident

Operational: Solved - Outage on EU Platform - AskCody Services & Visitor module

Notice Resolved View vendor source →

AskCody experienced a notice incident on November 17, 2023 affecting Outlook Add-in and Outlook Add-in and 1 more component, lasting 12d 22h. The incident has been resolved; the full update timeline is below.

Started
Nov 17, 2023, 03:00 PM UTC
Resolved
Nov 30, 2023, 01:57 PM UTC
Duration
12d 22h
Detected by Pingoru
Nov 17, 2023, 03:00 PM UTC

Affected components

Outlook Add-inOutlook Add-inVisitor Management PortalMeeting Services PortalCheck-in kiosk

Update timeline

  1. investigating Nov 17, 2023, 03:00 PM UTC

    Affected Users: All Services users and Service Providers Region: Outside of North America We are currently experiencing a major outage in the AskCody Services module and we are now investigating the impact. Users may experience this issue as being unable to place or edit Service requests in the Outlook add-in, or create ad-hoc requests directly in the AskCody Management Portal. The next update will be provided once we know more, or at latest: November 20th Major Outage Definition: A component is unavailable for all users. Affects the up-time calculation 100%.

  2. monitoring Nov 17, 2023, 05:30 PM UTC

    All modules are fully operational again. We are monitoring the situation and will update with any relevant information. The next update will be provided once we know more, or at latest: November 20th

  3. monitoring Nov 20, 2023, 10:16 AM UTC

    We are experiencing the major outage to be ongoing and that the implemented solution were inefficient. We are working towards a new solution as quickly as we can. The next update will be provided once we know more, or at the latest at 1 pm CEST

  4. identified Nov 20, 2023, 11:44 AM UTC

    Affected Users: All Services users and Service Portal users, and all Visitor users and Visitor Portal users Region: Outside of North America The cause of the major outage in the Service and Visitor module has been identified, and we are now working to find a solution. Users may experience timeouts and login being unavailable The next update will be provided at latest: 3 pm CEST Major Outage Definition: A component is unavailable for all users. Affects the up-time calculation 100%.

  5. identified Nov 20, 2023, 02:02 PM UTC

    Affected Users: All Services users and Service Portal users, and all Visitor users and Visitor Portal users Region: Outside of North America The cause of the major outage in the Service and Visitor module has been identified, and we are in dialogue with Microsoft to resolve the matter. Users may experience timeouts and that login is unavailable The next update will be provided at latest: 8 pm CEST Major Outage Definition: A component is unavailable for all users. Affects the up-time calculation 100%.

  6. identified Nov 20, 2023, 06:24 PM UTC

    Affected Users: All Services users and Service Portal users, and all Visitor users and Visitor Portal users Region: Outside of North America The cause of the major outage in the Service and Visitor module has been identified, and we are still in dialogue with Microsoft to resolve the matter. Users may experience timeouts and that login is unavailable The next update will be provided at latest: 10 pm CEST Major Outage Definition: A component is unavailable for all users. Affects the up-time calculation 100%.

  7. identified Nov 20, 2023, 08:59 PM UTC

    Affected Users: All Services users and Service Portal users, and all Visitor users and Visitor Portal users Region: Outside of North America The cause of the major outage in the Service and Visitor module has been identified, and we are still in dialogue with Microsoft to resolve the matter. Users may experience timeouts and that login is unavailable The next update will be provided tomorrow at the latest at 8 am CEST Major Outage Definition: A component is unavailable for all users. Affects the up-time calculation 100%.

  8. monitoring Nov 21, 2023, 07:03 AM UTC

    Affected Users: All Services users and Service Portal users, and all Visitor users and Visitor Portal users Region: Outside of North America We are monitoring our implemented solution, to see if it can hold the full load of a mornings operations. We are closely following along, and will update this according to the way it develops. Users may experience timeouts and that login is unavailable The next update will be provided tomorrow at the latest at 8 am CEST Major Outage Definition: A component is unavailable for all users. Affects the up-time calculation 100%.

  9. monitoring Nov 21, 2023, 09:15 AM UTC

    Affected Users: All Services users and Service Portal users, and all Visitor users and Visitor Portal users Region: Outside of North America The platform has been running since the implemented solution and are continuing to be available. We will keep this in monitoring until we are certain before we put it back to operational. For you as a user, this means you should be able to use the Portal, potentially experiencing a little delay. The next update will be provided tomorrow at the latest at 8 am CEST Major Outage Definition: A component is unavailable for all users. Affects the up-time calculation 100%.

  10. monitoring Nov 21, 2023, 12:42 PM UTC

    Affected Users: All Services users and Service Portal users, and all Visitor users and Visitor Portal users Region: Outside of North America The platform has been running since the implemented solution and we will continue to monitor the progress. For you as a user, this means you should be able to use the Portal, potentially experiencing a little delay. We will provide an update whenever there are relevant news or the status changes. Degraded Performance Definition: The affected component is working but is slow or otherwise impacted in a minor way. Does not affect downtime.

  11. monitoring Nov 23, 2023, 08:06 AM UTC

    Affected Users: All Services users and Service Portal users, and all Visitor users and Visitor Portal users Region: Outside of North America The platform has been running since the implemented solution, without any measured outage or degraded performance. We will keep this in a monitoring state, until we can be sure the incident is fully solved. We will provide an update whenever there are relevant news or the status changes.

  12. resolved Nov 30, 2023, 01:57 PM UTC

    After monitoring the implemented solution, the major outage has been resolved and the Post Mortem will follow shortly

  13. postmortem Nov 30, 2023, 01:57 PM UTC

    ### **Post-Mortem Report: AskCody Database Incident \(Nov 17, 2023 - Nov 20, 2023\)** At AskCody, we believe in maintaining a transparent and responsible relationship with our customers. This report is intended to provide a comprehensive overview of the recent database incident and our response to it. ### Incident Overview * **Start:** November 17, 2023, 16:00 CET * **Resolution Applied:** November 20, 2023, Afternoon * **Status Update:** November 21, 2023, Morning ### Incident Description The incident involved a 'lock wait timeout exceeded' error in our database system, caused by complex interactions within our database infrastructure that led to transaction delays and timeouts. ### Timeline of Events * **Initial Detection:** The issue was first identified on November 17, 2023. * **Investigation Period:** Between November 17 and 20, 2023, we worked closely with Microsoft support to analyze and address the issue. * **Resolution Implementation:** A fix was successfully applied in the afternoon of November 20, 2023. * **Status Update:** The status page was updated the following morning, on November 21, 2023. ### Root Cause Analysis The precise root cause remains undetermined, but it appears to be related to previously unencountered behaviors within the MS Azure database infrastructure. ### Remedial Actions and Prevention * **Collaboration with Microsoft:** We have intensified our collaboration with Microsoft to enhance our database management practices. * **System Adjustments:** Various system improvements have been implemented to mitigate the risk of similar issues. ### Current Status Following the application of the fix, we have been diligently monitoring our systems. As of now, we have seend no further issues. ### Our Commitment We apologize for any inconvenience this incident may have caused. Ensuring the reliability and efficiency of our services is our top priority, and we are committed to continuous improvement. Thank you for your understanding and continued support. Sincerely, The AskCody Team