Eptura Workplace incident

S2 - We are investigating an issue with Eptura Workplace SSO Timeouts

Eptura Workplace experienced a major incident on May 8, 2024 affecting System Status, lasting 1h 59m. The incident has been resolved; the full update timeline is below.

Started: May 08, 2024, 06:00 PM UTC
Resolved: May 08, 2024, 07:59 PM UTC
Duration: 1h 59m
Detected by Pingoru: May 08, 2024, 06:00 PM UTC

Affected components

System Status

Update timeline

investigating May 08, 2024, 06:00 PM UTC

We are currently investigating an issue with iOffice. We will update you when we have more information.
investigating May 08, 2024, 06:03 PM UTC

We are continuing to investigate this issue.
monitoring May 08, 2024, 07:00 PM UTC

A fix has been implemented. We are moving into the Monitoring Phase for the next 60 minutes.
resolved May 08, 2024, 07:59 PM UTC

As we have not seen further service disruptions after the fix was implemented, we have moved to the Resolved Phase. A Preliminary RCA will be posted in this incident in 2 business days. Please stay subscribed to the page to receive post automatically.
postmortem May 17, 2024, 03:38 PM UTC

**Eptura Workplace detailed Root Cause Analysis | 05/08/2024** **S2 - Eptura Workplace SSO Timeouts** We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident. **Description:** When accessing Eptura Workplace, both internal and external users were receiving the following error 502 A timeout occurred on [federation.api.iofficeconnect.com](http://federation.api.iofficeconnect.com). **Type of Event:** Outage for Customers who use Single Sign On \(SSO\) **Services/Modules impacted:** Production/ SSO **Timeline** _\(Reported in MST\)_**:** 11:30am – Multiple customers reported the inability to access Eptura Workplace. 11:59pm – After initial investigation, the support team escalates a ticket for CloudOps for further troubleshooting. All customers were made aware of the S2 incident via Status Page. 12:16pm – The CloudOps team identifies the issue and begins working on a resolution. 12:59pm – The fix was released to production. The Status Page was updated from Investigating to Monitoring. 1:59pm – While monitoring, no additional reports were received, and customers began to confirm the fix. ` ` **Total Duration of Event:** 2hrs 29mins **Root Cause:** The Mesos Marathon experienced a temporary downtime during the recent restart of the NGINX server. **Remediation:** Mesos Marathon was restored and promptly restarted the NGINX server, once the recovery was complete. **Preventative Action:** The proxy settings previously pointing to the Mesos Marathon service, which has been decommissioned, have been successfully updated. This change has already been implemented to ensure continued service efficiency.