gaiia software experienced a major incident on October 20, 2025 affecting Web App (app.gaiia.com) and Public GraphQL API and 1 more component, lasting 13h 55m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- monitoring Oct 20, 2025, 11:38 AM UTC
At 12:11 AM PDT, AWS experienced a global outage impacting multiple services (AWS Health Dashboard). Although the majority of services have now recovered, we are continuing to observe elevated error rates and performance slowdowns in some regions. Our engineering team is actively monitoring the situation and working with AWS to ensure full resolution. We will provide further updates as more information becomes available.
- monitoring Oct 20, 2025, 01:25 PM UTC
All core services have now recovered and are fully operational following the recent AWS global outage. However, our analytics and reporting features remain unavailable due to a continued outage with Snowflake, which was also impacted by the AWS issue. Our team is closely monitoring Snowflake’s status and will restore analytics access as soon as Snowflake services are back online. We will provide further updates when more information is available.
- monitoring Oct 20, 2025, 02:13 PM UTC
We are currently experiencing a new wave of errors affecting multiple services. You may encounter degraded performance or intermittent errors across several features. We are actively working with AWS to identify and resolve the underlying issues as quickly as possible. We will continue to provide updates as soon as more information is available.
- monitoring Oct 20, 2025, 05:29 PM UTC
AWS has identified the root cause of the incident and is actively working on remediation. While no specific resolution timeline has been provided, AWS has indicated that the next update will be issued at 1:45 p.m. ET. In parallel, we are preparing for a potential disaster recovery activation. A readiness drill is scheduled for 5:00 p.m. ET, and should AWS services not be restored by then, we plan to initiate our disaster recovery procedure at 11:00 p.m. ET.
- monitoring Oct 20, 2025, 05:51 PM UTC
We are seeing early signs of recovery, with users able to log back into gaiia and connectivity errors significantly decreasing.
- monitoring Oct 20, 2025, 07:36 PM UTC
AWS has reported continued recovery across all services and Availability Zones. While some residual network connectivity issues may persist, the majority of gaiia users are now able to resume normal operations within the platform. We are moving the affected components to "Degraded" as we continue to monitor the latency and error rate.
- monitoring Oct 20, 2025, 08:54 PM UTC
We continue to see network connectivity issues, especially during login. This outage impacts other subprocessors, notably LaunchDarkly, which serves some tenant configuration, and Explo, which provides the Snowflake connectors in our Analytics module.
- monitoring Oct 20, 2025, 10:38 PM UTC
AWS has indicated that full recovery is expected within the next two hours. While we remain cautiously optimistic, we recognize that several of our downstream providers have been significantly affected by this outage and may require additional time to fully restore their services.
- monitoring Oct 20, 2025, 11:26 PM UTC
Although AWS has marked the incident as resolved, we continue to observe errors originating from the LaunchDarkly backend SDK, which was affected by the AWS outage. Because LaunchDarkly is used to deliver feature flags for tenant configurations, this issue is causing intermittent network failures during user authentication within gaiia tenants. That said, we are seeing early signs of recovery from LaunchDarkly. To accelerate service stabilization, we are deploying a configuration cache designed to bypass LaunchDarkly when it becomes unresponsive.
- monitoring Oct 21, 2025, 12:31 AM UTC
We are pushing a fix to production to circumvent LaunchDarkly's current server-side SDK outage. This has proved effective in staging and will be rolled out to production, module by module.
- resolved Oct 21, 2025, 01:34 AM UTC
Connectivity issues have subsided, and all modules are now operational. During the next hours, we will re-drive pending events such as orders, notifications, payments and workflows, and ensure data integrity across modules.