Uscreen incident

Pages Experiencing Slow Loading Times

Minor Resolved View vendor source →

Uscreen experienced a minor incident on January 6, 2025 affecting Admin Portal and Storefront, lasting 2h 16m. The incident has been resolved; the full update timeline is below.

Started
Jan 06, 2025, 05:29 PM UTC
Resolved
Jan 06, 2025, 07:45 PM UTC
Duration
2h 16m
Detected by Pingoru
Jan 06, 2025, 05:29 PM UTC

Affected components

Admin PortalStorefront

Update timeline

  1. investigating Jan 06, 2025, 05:29 PM UTC

    We are currently looking into an issue causing slow loading times for both the Admin and Storefront pages.

  2. identified Jan 06, 2025, 05:58 PM UTC

    Our team has identified the root cause of the slowdown affecting the Admin and storefront pages. We are currently working on a fix.

  3. identified Jan 06, 2025, 06:35 PM UTC

    Our team continues to work towards a fix for the issue causing slow loading times for the catalog and admin pages.

  4. identified Jan 06, 2025, 07:09 PM UTC

    Our team continues to work towards a solution for this issue. Thank you for your continued patience.

  5. monitoring Jan 06, 2025, 07:29 PM UTC

    A fix has been implemented, and we are seeing improvements in the Admin Area and Storefront pages. Our team is actively monitoring the situation and tracking the improvements.

  6. resolved Jan 06, 2025, 07:45 PM UTC

    The issue has been successfully resolved. Thank you for your patience during this process.

  7. postmortem Jan 06, 2025, 08:30 PM UTC

    # Postmortem **Date of Incident: 1/6/25** ## Summary: We recently experienced degraded performance and limited service interruptions caused by unprecedented traffic growth and an issue within our database infrastructure. A CPU resource leak in our database provider’s system contributed to high resource utilization, compounding the challenge of meeting demand. This incident tested our system’s capacity, and while it caused temporary disruptions, it also highlighted opportunities for immediate and long-term improvements. ## Root Cause * **Increased Demand:** * Platform traffic and user activity tripled compared to typical levels, leading to an unexpected surge in database load. * This spike caused memory saturation and an overabundance of connections. * **CPU Resource Leak:** * A resource leak in the database provider’s infrastructure led to persistent CPU spikes, limiting system efficiency. * This issue prevented the system from scaling effectively to handle workloads. * **Inefficient Resource Allocation:** * Idle database connections and unoptimized queries further stressed the infrastructure, reducing its ability to respond to peak demand. ## Resolution ### **Immediate Actions:** * **Increased Database Capacity:** * Doubled CPU and memory resources in our database instance to handle the increased load. * **Instance Restarts:** * Refreshed instances to eliminate stale connections and stabilize performance. * **Cleared Stale Connections:** * Optimized active connection counts, reducing strain on the system. ### Short-Term Adjustments: * Adjusted query handling to improve load distribution. * Enhanced monitoring tools to identify potential resource leaks sooner. ‌ This incident underscores the challenges of balancing rapid growth with infrastructure resilience. By immediately increasing database capacity and addressing inefficiencies, we stabilized the platform for now. Moving forward, we are committed to strengthening our systems through proactive scaling, deeper collaboration with our providers, and better resource management to support your continued success on our platform.