Eptura Workplace incident

S1 - Users having the inability to access Eptura Workplace

Critical Resolved View vendor source →

Eptura Workplace experienced a critical incident on June 10, 2024 affecting Space Module and System Status and 1 more component, lasting 2h 31m. The incident has been resolved; the full update timeline is below.

Started
Jun 10, 2024, 11:28 AM UTC
Resolved
Jun 10, 2024, 02:00 PM UTC
Duration
2h 31m
Detected by Pingoru
Jun 10, 2024, 11:28 AM UTC

Affected components

Space ModuleSystem StatusHummingbird AppMove ModuleService Request AppReservation ModuleAsset Manager AppService Request ModuleMail AppAsset Module

Update timeline

  1. identified Jun 10, 2024, 11:28 AM UTC

    The issue with Users having the inability to access Eptura Workplace has been identified and a fix is being implemented. We will post another update at 8:30 am CST or sooner.

  2. investigating Jun 10, 2024, 11:51 AM UTC

    We are currently investigating this issue.

  3. investigating Jun 10, 2024, 12:00 PM UTC

    We were seeing some error related to Auth Service. We have restarted the service and the Workplace application should be loading fine now.

  4. monitoring Jun 10, 2024, 12:01 PM UTC

    A fix has been implemented and we are monitoring the results.

  5. resolved Jun 10, 2024, 02:00 PM UTC

    As we have not seen further service disruptions after the fix was implemented, we have moved to the Resolved Phase. A RCA will be posted in this incident in 10 business days. Please stay subscribed to the page to receive post automatically.

  6. postmortem Jul 09, 2024, 05:19 PM UTC

    # **Eptura Workplace detailed Root Cause Analysis | 6/10/2024** ### **S1 – Users unable to access Eptura Workplace** ‌ We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident. **Description:** On the morning of June 10, 2024, Eptura Workplace users were unable to access the platform. **Type of Event:** Eptura Workplace was experiencing an Outage. **Services/Modules impacted:** All of Production **Timeline: \(Reported in CST\)**` ` * 5:20am: Our support team received an initial report of the inability to access Eptura Workplace. * 5:40am: Support team received 4 additional reports on the issue * 5:44am: After investigation the support team confirmed and raised a ticket with our Cloud Operations team. * 5:44am: All customers were alerted that we are in the Investigation of the inability to access Eptura Workplace on our status page. * 5:45am: The Cloud Operations team acknowledges the issue internally and begins the investigation. * 6:34am: The Cloud Operations team recognized some errors related to authorization services and began working to resolve these errors. * 6:53am: The Cloud Operations team corrects the errors and Support begins to confirm the resolution with customers. * 7:00am: After confirmation from multiple customers, our support team updated the status page to a monitoring phase for the next 2 hours. * 9:00am: No additional customers have reported the issue, the “Status Page” has been updated. **Total Duration of Event:** 1 hour 41 minutes **Root Cause:** A change to the database URL in a recent pull request caused a connection timeout issue with our in-app service. ‌ **Remediation:** We have successfully addressed the service disruption. The pull request has been reverted, and we have made the necessary changes to resolve the connection problem. ‌ **Preventative Action:** To prevent this disruption in the future, the Cloud Operations team will review all pull requests before they are merged. This process will help streamline our activities and reduce the risks of mistakes.