Universign incident

August 2023: Universign major outage on Transaction Service

Universign experienced a major incident on August 29, 2023 affecting Transaction Service and Timestamp Service and 1 more component, lasting 17h 34m. The incident has been resolved; the full update timeline is below.

Started: Aug 29, 2023, 04:00 PM UTC
Resolved: Aug 30, 2023, 09:35 AM UTC
Duration: 17h 34m
Detected by Pingoru: Aug 29, 2023, 04:00 PM UTC

Affected components

Transaction ServiceTimestamp ServiceSeal ServiceWeb ApplicationRegistration Service

Update timeline

investigating Aug 30, 2023, 07:52 AM UTC

We are currently experiencing difficulties on our platform. More information will be made available as it is conveyed to us.
investigating Aug 30, 2023, 08:17 AM UTC

We are continuing to investigate this issue.
identified Aug 30, 2023, 08:33 AM UTC

We are facing a major outage. The problem is identified and we are working on a fix right now.
monitoring Aug 30, 2023, 08:35 AM UTC

A fix has been implemented and we are monitoring the results.
monitoring Aug 30, 2023, 08:38 AM UTC

We are continuing to monitor for any further issues.
resolved Aug 30, 2023, 09:35 AM UTC

This incident has been resolved. We are preparing a global report about this incident. We deeply apologize for the inconvenience.
postmortem Aug 31, 2023, 03:12 PM UTC

We apologize for the inconvenience caused by the incident between 29/08/2023 5:56 PM and 30/08/2023 10:15 AM. ‌ **Timeline:** * 29/08/2023 5:56-6:03 PM: First alarm on intermittent unavailability of services * 29/08/2023 6:06-6:22 PM: Alarm on intermittent latency of services * 29/08/2023 9 PM - 30/08/2023 10:15 AM: Alarms on application memory for part of the services ‌ **Resolution of the incident:** * 29/08/2023 6:02 PM: Reduction of the limit on load balancers * 29/08/2023 6:11 PM: Exclusion of a server from the load balancer * 29/08/2023 6:38 PM - 30/08/2023 10:14 AM: Log analysis and successive restarts of application services ‌ **End of incident:** 30/08/2023 10:15 AM ‌ **Identified root cause:** Several instances experienced slowness which caused an overload on those instances, thus causing partial instability on all services of the platform. ‌ **Preventive actions:** To prevent the platform from overloading, we will be working on improving the settings of load balancers. In addition, we will be auditing the application configuration for each server to prevent slowness.