Cornerstone incident

Minor Issue - Edge Import Feeds Processing Delays - All Swimlanes

Cornerstone experienced a minor incident on October 27, 2025 affecting Response Time and Response Time and 1 more component, lasting 1d. The incident has been resolved; the full update timeline is below.

Started: Oct 27, 2025, 12:52 PM UTC
Resolved: Oct 28, 2025, 01:41 PM UTC
Duration: 1d
Detected by Pingoru: Oct 27, 2025, 12:52 PM UTC

Affected components

Response TimeResponse TimeResponse TimeResponse TimeResponse TimeResponse TimeResponse TimeResponse TimeResponse TimeResponse Time

Update timeline

identified Oct 27, 2025, 12:52 PM UTC

On October 26th we experienced an issue with Edge Import access, which has since been resolved. As a result, a backlog of pending feeds accumulated, causing delays in feed processing across all swimlanes. We are seeing feeds actively processing, and our team is proactively monitoring progress to ensure timely completion. Status Page will be updated regularly to reflect the progress.
identified Oct 27, 2025, 04:00 PM UTC

Our teams continue to closely monitor the ongoing loads. Please note that new files and scheduled feed runs should process as expected.
identified Oct 27, 2025, 06:08 PM UTC

We are continuing to closely monitoring activity to ensure the backlog fully clears. We will share regular updates until the queues have fully cleared.
identified Oct 27, 2025, 09:05 PM UTC

The backlog has cleared for most swimlanes. We are continuing to closely monitor progress to ensure it continues to catch up. We will share regular updates until the queues have fully cleared.
identified Oct 27, 2025, 11:08 PM UTC

Our engineering team continues to monitor as the backlog clears. We'll share regular updates as things process
identified Oct 28, 2025, 12:59 AM UTC

Our engineering team continues to monitor as the backlog clears. We'll share regular updates as things process.
identified Oct 28, 2025, 03:02 AM UTC

We are continuing to closely monitor activity to ensure the backlog fully clears. We will share regular updates until the queues have fully cleared.
identified Oct 28, 2025, 05:32 AM UTC

We are continuing to closely monitor activity to ensure the backlog fully clears. We will share regular updates until the queues have fully cleared.
identified Oct 28, 2025, 09:26 AM UTC

Following the initial issue, subsequent feed runs failed with the error “The feed was already running.” This is due to the Edge Import system’s built-in fail-safe and recovery mechanism, which prevents new feed runs from initiating while a previous instance is still in progress. This ensures the feed will become available for processing during the next scheduled or manually triggered run. The upcoming feed run is expected to process successfully. Summary of Current Status - First feed run: Stalled as a result of a known infrastructure issue, which has now been resolved. - Second feed run: Failed with the error “The feed was already running,” in accordance with the system’s fail-safe logic. - Third feed run: Anticipated to be successful on the next attempt. We are actively monitoring the situation and will continue to provide updates as needed.
monitoring Oct 28, 2025, 12:02 PM UTC

Queues on all swimlanes are cleared. Following the initial issue, subsequent feed runs failed with the error “The feed was already running.” This is due to the Edge Import system’s built-in fail-safe and recovery mechanism, which prevents new feed runs from initiating while a previous instance is still in progress. This ensures the feed will become available for processing during the next scheduled or manually triggered run. The upcoming feed run is expected to process successfully. Summary of Current Status - First feed run: Stalled as a result of a known infrastructure issue, which has now been resolved. - Second feed run: Failed with the error “The feed was already running,” in accordance with the system’s fail-safe logic. - Third feed run: Anticipated to be successful on the next attempt. We are actively monitoring the situation and will continue to provide updates as needed.
resolved Oct 28, 2025, 01:41 PM UTC

Our observations indicate feeds are processing as expected (see below), therefore concluding this issue as Resolved. Following the initial issue, subsequent feed runs failed with the error “The feed was already running.” This is due to the Edge Import system’s built-in fail-safe and recovery mechanism, which prevents new feed runs from initiating while a previous instance is still in progress. This ensures the feed will become available for processing during the next scheduled or manually triggered run. The upcoming feed run is expected to process successfully. Summary of Current Status - First feed run: Stalled as a result of a known infrastructure issue, which has now been resolved. - Second feed run: Failed with the error “The feed was already running,” in accordance with the system’s fail-safe logic. - Third feed run: Anticipated to be successful on the next attempt.
postmortem Nov 06, 2025, 12:39 AM UTC

**What Happened:** On October 26th, 2025, clients experienced intermittent delays and failures while using the Edge Import. Multiple import jobs \(feeds/loads\) remained in a "Processing/Running" state without completing or updating their status, leading to user confusion and delayed data visibility. Subsequent import attempts by clients failed due to overlapping runs, as earlier sessions had not been properly released by the system. This resulted in queue congestion, increased processing time, and temporary unavailability of the Edge Import service. **Why It Happened:** The issue was triggered by temporary system instability during ongoing infrastructure updates, which affected the job orchestration and state management layer of the Edge Import service. * This instability caused: * Job execution threads to hang in an active state, preventing completion signals from being written to the database. * Locking of concurrent feed executions, which led to subsequent jobs failing validation checks for overlapping runs. As a result, the system displayed incorrect processing statuses and failed to clear stale sessions until manual intervention occurred. **Immediate Actions Taken:** Engineering teams promptly investigated the stuck job instances and identified orphaned processes. A manual cleanup and service restart were performed to release stuck jobs and reset job states. Post-restart, all pending Edge Import feeds were reprocessed successfully, and client queues began clearing progressively. User communication was shared to confirm restoration of normal import functionality. **How We Fixed It:** * Reviewed and optimized the job orchestration logic to ensure proper state synchronization even during transient infrastructure changes. * Implemented automated detection and termination of long-running or orphaned jobs that exceed defined thresholds. * Updated status reporting logic to ensure more accurate reflection of processing state in the UI and logs. * Conducted system-level validation tests post-update to confirm consistent job completion and cleanup behavior. **How We Will Prevent Recurrence:** * Enhanced Monitoring: * Introduce deeper telemetry for job lifecycle events \(queued, running, completed, failed\) to detect stuck or overlapping imports in real time. * Operational Readiness Improvements: * Implement pre-update load validation and post-update functional verification to prevent partial service instability during maintenance.