Flowhub incident

Severe Degradation Causing Issues with Login and Various Operations

Critical Resolved View vendor source →

Flowhub experienced a critical incident on August 1, 2025 affecting Maui POS and Greet and 1 more component, lasting 1h 2m. The incident has been resolved; the full update timeline is below.

Started
Aug 01, 2025, 04:00 PM UTC
Resolved
Aug 01, 2025, 05:02 PM UTC
Duration
1h 2m
Detected by Pingoru
Aug 01, 2025, 04:00 PM UTC

Affected components

Maui POSGreetStashViewAPI

Update timeline

  1. investigating Aug 01, 2025, 04:11 PM UTC

    We are currently investigating this issue with all engineers on hand.

  2. investigating Aug 01, 2025, 04:15 PM UTC

    We are continuing to investigate this issue.

  3. investigating Aug 01, 2025, 04:27 PM UTC

    The team is working through various potential resolution methods to restore services as quickly as possible.

  4. monitoring Aug 01, 2025, 04:35 PM UTC

    A fix has been deployed and all services should be restored.

  5. resolved Aug 01, 2025, 05:02 PM UTC

    All services have been confirmed to have returned to normal operating performance and availability.

  6. postmortem Aug 08, 2025, 06:34 PM UTC

    **Status -** Resolved **Summary -** An outage occurred on Aug 1 from 9:55-10:34am MT due to expired TLS certificates associated with deprecated services. Although these services were no longer active, our NGINX ingress controller continued checking them during traffic routing for all Maui services. Despite all currently used services having valid certificates, the presence of the expired ones in the controller’s cache triggered routing issues. We resolved the issue by restarting the ingress controller to clear the stale certificates and this restart and recovery took approximately 25 minutes. All services were then fully restored. **Impact -** All traffic was refused by the ingress controller from 9:55-10:34am MT causing a disruption in access to Flowhub Maui and several supported applications for most users. **Resolution -** Restart of the Ingress controller to clear these cached expired certificates was the resolution. **Future Preventions -** * We’ve created a check for certificate validation in advance of certification expiry. Flowhub has heartbeat monitoring of its services, but those don’t flow through the traffic controller to do their checks. * We’ve added more FE Latency check alerts to customer support channels