SolarWinds experienced a minor incident on November 7, 2023 affecting Metrics Ingest Pipeline, lasting 15d 23h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Nov 07, 2023, 04:06 PM UTC
The issue has been identified. Possible fixes are under investigation.
- identified Nov 09, 2023, 07:22 PM UTC
The issue has been identified and a fix is being implemented. Currently, a few customer environments may see data gaps in some charts.
- monitoring Nov 20, 2023, 09:45 PM UTC
Over the last few days, we have continued to see a large amount data coming into our ingestion pipeline. We have tried to address this issue by increasing our infrastructure capacity and have tuned the configurations to the maximum allowed limits. In parallel, we have paused one of the problematic environments temporarily to not inundate the ingestion pipeline. These actions are helping to process the data at an improved speed. However, it may still take 2-3 days to get through all the backlog. Until such time, some customers may see some data gaps in monitoring.
- monitoring Nov 21, 2023, 09:04 PM UTC
Although progress has been made in reducing the backlog in the ingestion pipeline for certain environments, the processing speed is not meeting our desired pace. We are actively examining the source and quantity of incoming data to find ways to decrease the amount of data being ingested, aiming to enhance processing efficiency. This involves tasks such as relocating high-volume workloads for manual load balancing. Additionally, the architecture team is ongoing in their assessment of scaling options to further address the issue.
- monitoring Nov 22, 2023, 05:25 PM UTC
Our recent initiative to relocate demanding workloads has significantly boosted the performance of our ingestion pipeline. Processing has become more efficient, and we anticipate that within the next 24 hours, the system is expected to clear the entire backlog, based on the current rate and trend. We're actively monitoring the situation and concurrently working on further optimizing processing of incoming data.
- resolved Nov 23, 2023, 03:17 PM UTC
The delayed ingestion issue was resolved approximately at 11:00 pm EST on 11/22/2023. All customer environments are functioning normally. We are continuing to monitor the environment. If you are experiencing any difficulties, please contact us at https://support.solarwinds.com/