Entitle incident

Entitle US Region down time

Major Resolved View vendor source →

Entitle experienced a major incident on June 16, 2025 affecting Entitle Portal, lasting 11m. The incident has been resolved; the full update timeline is below.

Started
Jun 16, 2025, 06:48 PM UTC
Resolved
Jun 16, 2025, 07:00 PM UTC
Duration
11m
Detected by Pingoru
Jun 16, 2025, 06:48 PM UTC

Affected components

Entitle Portal

Update timeline

  1. investigating Jun 16, 2025, 06:48 PM UTC

    We are currently investigating this issue.

  2. resolved Jun 16, 2025, 07:00 PM UTC

    This incident has been resolved.

  3. postmortem Jun 16, 2025, 07:56 PM UTC

    ## Postmortem: Redis Outage — Entitle US Region **Date:** 2025-06-16 **Service Affected:** Entitle – US Region **Root Cause:** Redis pods CrashLoopBackOff ### Summary On **June 16, 2025**, the Entitle US region experienced a **partial service outage** due to a failure in the **Redis** backend service. The Redis pod entered a `CrashLoopBackOff` state, failing to restart successfully. This caused cascading issues in dependent services, including elevated Redis client errors, Pub/Sub disconnections, and connection retry exhaustion. The issue was resolved by performing a **manual hard reset** of the Redis pod. The Redis container has **crashed unexpectedly**, and Kubernetes was **unable to successfully restart it** due to repeated startup failures. This caused a `BackOff` state, where Kubernetes delays further restarts. The downstream Node.js services attempted to reconnect continuously, leading to: * **Redis connection timeouts** * **Pub/Sub disconnections \(**`EOF`\) * **Memory pressure due to too many event listeners** This condition persisted until the pod was **forcefully deleted**, which reset the backoff and allowed Redis to start cleanly. Action Items : A **new monitor was created** to track `CrashLoopBackOff` and `BackOff` events in critical infrastructure pods. This will allow us to detect and respond to container restart failures earlier — potentially preventing downtime through intervention.