Sardine incident

Error evaluating rules in /v1/customers API endpoint

Major Resolved View vendor source →

Sardine experienced a major incident on July 17, 2025 affecting Customer APIs, lasting 18m. The incident has been resolved; the full update timeline is below.

Started
Jul 17, 2025, 08:59 PM UTC
Resolved
Jul 17, 2025, 09:17 PM UTC
Duration
18m
Detected by Pingoru
Jul 17, 2025, 08:59 PM UTC

Affected components

Customer APIs

Update timeline

  1. identified Jul 17, 2025, 08:59 PM UTC

    The issue has been identified and a fix is being implemented.

  2. monitoring Jul 17, 2025, 09:09 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. resolved Jul 17, 2025, 09:17 PM UTC

    This incident has been resolved.

  4. postmortem Jul 18, 2025, 06:39 PM UTC

    ### **To:** Affected Partners ### **From:** [Sardine.ai](http://Sardine.ai) ### **Introduction** * **Purpose:** This report provides an overview of the recent service disruption impacting users of the Customers API. * **Apology:** We sincerely apologize for any inconvenience this disruption may have caused. We remain dedicated to maintaining high service availability and reliability. ### **Incident Overview** * **Duration:** Approximately 30 minutes, from: _2025-07-17, 08:35 PM UTC to 2025-07-17, 09:05 PM UTC_ * **Region Affected:** All regions * **Services Affected:** Customers API ### **Root Cause Analysis** * **Primary Issue:** A change in the type of 3 features used in the context of our rules engine when processing Customers API calls inadvertently made several incoming requests to fail * **Detailed Explanation:** The issue occurred because the deployment of this update followed a canary strategy. This meant that, during a period of time, different instances of some internal services were processing the same feature differently. This caused unmarshalling errors to occur in cross-service communication, which in turn failed the overall request associated with them. ### **Impact** * **Service Accessibility:** The majority of requests to the Customers API failed during the incident window. ### **Detection and Recovery Time** * A few clients reached about a spike in errors from the Customers API. About the same time, our monitors spotted an abnormal error rate in the API and paged the on-call engineer. Once aware of the issue, our engineering them immediately found the root cause and rolled back the faulty commit. ### **Corrective Actions and Improvements** * **Immediate Response:** The faulty commit was removed from production as soon as the problem was discovered, promptly restoring the Customers API to normal operation for all partners. * **Preventive Measures:** Monitors and alerts are going to be put in place in our Sandbox environment to prevent this kind of issue from happening again in production ### **Conclusion** * **Commitment:** Sardine remains firmly committed to delivering reliable and resilient services to our partners. We deeply regret the inconvenience caused by this incident and appreciate your patience and understanding. * **Appreciation:** Thank you for your continued trust and partnership. We value your support as we strengthen our systems and processes to ensure greater reliability and stability.