Sekoia FRA1 incident

FRA1 event storage cluster performance issues impacting search jobs

Sekoia FRA1 experienced a minor incident on June 12, 2025 affecting Event storage, lasting 1d 5h. The incident has been resolved; the full update timeline is below.

Started: Jun 12, 2025, 04:03 PM UTC
Resolved: Jun 13, 2025, 09:06 PM UTC
Duration: 1d 5h
Detected by Pingoru: Jun 12, 2025, 04:03 PM UTC

Affected components

Event storage

Update timeline

identified Jun 12, 2025, 04:03 PM UTC

We are currently experiencing performance issues with our event storage cluster in the FRA1 region. This is impacting search jobs, resulting in slower event research. Our engineers are investigating this issue and we are working to restore normal functionality as soon as possible.
identified Jun 12, 2025, 04:03 PM UTC

We have identified that the performance issues with our event storage cluster in the FRA1 region are likely due to high load on newly added nodes. These nodes are currently handling the majority of new traffic and most recent queries, resulting in slower search jobs. Our team is discussing options to alleviate the traffic to these nodes. We appreciate your patience as we work to resolve this issue.
monitoring Jun 12, 2025, 04:08 PM UTC

Our team has successfully applied measures to reduce the load on the newly added nodes in our event storage cluster in the FRA1 region. However, we are still observing slower than normal search jobs. Our team continues to monitor the situation closely and further adjustments will be made as necessary. We apologize for any inconvenience caused and appreciate your patience as we work towards a resolution.
monitoring Jun 13, 2025, 12:05 PM UTC

We are still experiencing performance issues with our event storage cluster in the FRA1 region today. We have identified a high volume of specific search operations that are contributing to the load on the system. We are currently exploring options to manage these operations more efficiently to alleviate the load. We appreciate your continued patience and will provide further updates as we continue to work towards a resolution.
monitoring Jun 13, 2025, 02:59 PM UTC

We have identified that the performance issues with our event storage cluster in the FRA1 region were primarily due to a high volume of heavy specific search operations. We have now stopped these operations, and we have cancelled the remaining running tasks. We are already observing a significant improvement in system performance and search job duration. We are continuing to monitor the situation closely to ensure that this positive trend continues. We appreciate your patience during this time and will continue to provide updates.
monitoring Jun 13, 2025, 03:09 PM UTC

The performance of events research are back to normal. We are continuing to monitor the situation closely to ensure that this positive trend continues. We appreciate your patience during this time and will continue to provide updates.
resolved Jun 13, 2025, 09:06 PM UTC

This incident has been resolved.