WebID Solutions incident

Production environment is down

Critical Resolved View vendor source →

WebID Solutions experienced a critical incident on December 2, 2024 affecting AccountID and AutoID and 1 more component, lasting 2h 42m. The incident has been resolved; the full update timeline is below.

Started
Dec 02, 2024, 03:05 PM UTC
Resolved
Dec 02, 2024, 05:47 PM UTC
Duration
2h 42m
Detected by Pingoru
Dec 02, 2024, 03:05 PM UTC

Affected components

AccountIDAutoIDavsIDeIDSignIDSignID DirectTrueIDVideoIDVisualID

Update timeline

  1. identified Dec 02, 2024, 04:04 PM UTC

    We have identified an issue with the production environment. All services are affected. We are working hard to solve the problem.

  2. identified Dec 02, 2024, 04:42 PM UTC

    We have identified the issue and started to fix it.

  3. monitoring Dec 02, 2024, 04:43 PM UTC

    A fix has been deployed. We are currently monitoring the fix to ensure the correct operation.

  4. resolved Dec 02, 2024, 05:47 PM UTC

    Status report on the outage of the WebID service On 2 December 2024, at 4:05 p.m., the WebID service experienced an outage. This was caused by the complete exhaustion of log protocol capacities in the database cluster. This incident was due to an unfortunate combination of several factors: 1. Postponement of the maintenance window: Due to a customer request, the planned maintenance window from last Thursday (28–29 November) was moved to today, Monday (2–3 December). 2. Increased data traffic: The heightened activity during Black Week and Black Friday resulted in significantly higher system traffic and log generation. The combination of these factors meant that the log protocol capacities were larger than calculated, which ultimately led to the unavailability of the WebID service. After a thorough analysis of the problem, the log protocols were manually cleaned up and all nodes of the database cluster were restarted. At 5:43 p.m., we were able to restore regular operation. Measures taken to prevent future incidents: - Expansion of log capacity: The capacity was adjusted to ensure sufficient reserves even in the event of high data volumes. - Improved Checks for Rescheduled Maintenance: In the event of future postponements of maintenance windows, a detailed review of the remaining log capacities will be included in our catalogue of measures by default.