Entitle incident

Sync Reporting Degradation

Notice Resolved View vendor source →

Entitle experienced a notice incident on April 6, 2025, lasting —. The incident has been resolved; the full update timeline is below.

Started
Apr 06, 2025, 07:01 AM UTC
Resolved
Mar 20, 2025, 06:00 PM UTC
Duration
Detected by Pingoru
Apr 06, 2025, 07:01 AM UTC

Update timeline

  1. resolved Apr 06, 2025, 07:01 AM UTC

    Between March 20–21, certain customer environments experienced a temporary degradation in sync reporting, where data generated by agents was not fully ingested by our backend systems. The issue was rooted in a recent credential rotation and was fully resolved within hours of identification. No impact was observed on access provisioning or enforcement, and no data was lost. Functionality has been fully restored.

  2. postmortem Apr 06, 2025, 07:02 AM UTC

    # Sync Reporting Degradation **Date:** March 20–21, 2025 **Service Impacted:** Sync Data Reporting **Impact Duration:** Intermittent impact from March 20th to March 21st, 2005. ## Summary Between March 20–21, certain customer environments experienced a temporary degradation in sync reporting, where data generated by agents was not fully ingested by our backend systems. The issue was rooted in a recent credential rotation and was fully resolved within hours of identification. No impact was observed on access provisioning or enforcement, and no data was lost. Functionality has been fully restored. ## Background On March 15, a critical vulnerability in GitHub Actions was disclosed \(CVE-2025-30066\), highlighting a potential risk of secret leakage via CI logs. As part of our proactive security posture, Entitle initiated a comprehensive rotation of all production secrets stored in GitHub. Among the rotated credentials was a token used by Entitle’s agent software to transmit sync metadata to a backend storage system. While most GitHub secrets are associated with build and deployment pipelines, this specific token was also used in the runtime path of deployed agents — a distinction that wasn’t fully accounted for during the rotation process. ## Timeline ### March 20 The storage service token was rotated as part of a broader security effort. ### March 21 A subset of sync tasks across some customer tenants were not fully transmitted to Entitle's backend. Initial signals were not escalated immediately due to routing misconfigurations in alerting logic. Engineering was notified of anomalies and began an active investigation. An hour later, a fix was deployed, including token override distribution and a new agent image update. ## Root Cause The rotated credential was being used operationally by agents, not just by CI/CD processes. This use case was not fully scoped during the rotation, leading to unexpected communication issues between agents and backend services. Alerting for failed data transmissions was present but not configured to reach the relevant on-call personnel, delaying detection. ## Resolution * A new valid token was issued and rolled out using an agent override mechanism. * A fresh agent version containing the new token was published. * Affected tenants were monitored closely post-resolution to ensure full restoration of data flows. ## Mitigations & Next Steps ### Credential Management Enhancements * Segregate CI/CD and runtime credentials across dedicated secret stores. * Introduce tagging and classification for all credentials to indicate their operational usage. ### Monitoring and Alerting Improvements * Add high-sensitivity monitors for sync degradation and agent communication failures. * Ensure alert routing logic is codified and version-controlled \(via Terraform\). ### Operational Resilience * Enhance agent-side error handling for token-related failures. * Introduce rollback fallback logic in critical agent communication flows.