Scout APM experienced a major incident on June 3, 2019 affecting Application Monitoring, lasting 48m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jun 03, 2019, 03:47 PM UTC
We appear to be using more than the expected number of database connections, causing failures on our Web UI. Ingestion is backed up, but the incoming data is safe and collected.
- investigating Jun 03, 2019, 04:03 PM UTC
We've identified and fixed the database connection issue. We are currently loading the backlog of data that was held during the incident. Data will be appearing in the UI shortly.
- resolved Jun 03, 2019, 04:36 PM UTC
All chart metrics are now completely caught up. The root cause of the incident was due to attempted table partitioning during a database vacuum, which caused a lock on a critical table and cascaded to impact the rest of the application. We'll be adjusting our vacuum and partitioning schedules to avoid this lock again.