Aqua incident

CyberCenter Service Disruption

Major Resolved View vendor source →

Aqua experienced a major incident on August 27, 2024, lasting —. The incident has been resolved; the full update timeline is below.

Started
Aug 27, 2024, 06:28 PM UTC
Resolved
Aug 21, 2024, 08:30 PM UTC
Duration
Detected by Pingoru
Aug 27, 2024, 06:28 PM UTC

Update timeline

  1. resolved Aug 27, 2024, 06:28 PM UTC

    On August 21st, 2023, at approximately 8:30 PM UTC, our Container Image Scanning service experienced a major disruption due to a lambda function exceeding its ephemeral storage limit. The lambda, responsible for downloading and extracting a critical database for image scanning, was configured with 3GB of ephemeral storage. However, the extracted database size of 2.5GB, combined with the 500MB zip archive, exhausted the available storage, causing the lambda to enter a panic state. This resulted in a service outage, impacting container image scanning capabilities. Although monitoring was in place for various components, an alert specifically based on lambda panics was missing, delaying proactive identification and remediation. The Aqua Fields team promptly identified the issue and engaged the on-call channel. However, due to the unavailability of the US team and the late hour in India, response time was impacted. The India team resolved the incident at 10:52 PM UTC on August 21st by increasing the lambda's ephemeral storage. We apologize for any inconvenience caused by this disruption. We are taking steps to improve our monitoring and alerting capabilities, including implementing automated remediation where possible, to prevent similar incidents in the future.