Lucidworks incident

platform.lucidworks.com unavailable

Notice Resolved View vendor source →

Lucidworks experienced a notice incident on September 29, 2023 affecting Lucidworks Platform, lasting 1h 10m. The incident has been resolved; the full update timeline is below.

Started
Sep 29, 2023, 05:14 PM UTC
Resolved
Sep 29, 2023, 06:24 PM UTC
Duration
1h 10m
Detected by Pingoru
Sep 29, 2023, 05:14 PM UTC

Affected components

Lucidworks Platform

Update timeline

  1. investigating Sep 29, 2023, 05:14 PM UTC

    The configuration UI at platform.lucidworks.com is currently unavailable and returning errors when users attempt to access it. Lucidworks engineers are actively investigating the problem. This incident does NOT affect any indexing or search functionality; it is only preventing configuration changes to existing Connected Search applications. We will post updates as soon as we have more details to share.

  2. investigating Sep 29, 2023, 05:29 PM UTC

    We have identified the source of the issue and have reverted the recent change that caused it. The fix is actively rolling out. Some users will see recovery and be able to log in to platform.lucidworks.com already, while for others it may not show full recovery until the rollout has fully completed.

  3. investigating Sep 29, 2023, 05:55 PM UTC

    The platform.lucidworks.com interface is once again available and functional. No indexing or search query outage was experienced during this incident. We're continuing to monitor to ensure there are no lingering issues.

  4. resolved Sep 29, 2023, 06:24 PM UTC

    We have confirmed that reverting our earlier change has corrected the root cause of this incident, and platform.lucidworks.com is once again fully functional.

  5. postmortem Oct 05, 2023, 08:00 PM UTC

    A configuration change in our [ArgoCD](https://argo-cd.readthedocs.io/en/stable/#what-is-argo-cd) continuous deployment platform caused nondeterministic behavior when applied to our Kubernetes [ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/#what-is-ingress) controllers. Though this change was successfully applied in our development environment, when the same configuration was deployed to the production [Springboard UI](https://platform.lucidworks.com/), it put the cluster into an inaccessible state. We manually intervened to roll back the change as quickly as possible, but found that ArgoCD was still intermittently mis-applying the configuration in this environment. Further troubleshooting efforts identified a stale cache as the culprit; once we forced a hard reset of ArgoCD’s cache, the correct configuration was applied, and the configuration UI returned to a fully functional state. No Connected Search indexing or query functionality was affected during this incident; the extent of the impact was limited to the web-based configuration interface, and no data loss occurred.