UpGuard incident

Invalid notifications are being generated and sent

Critical Resolved View vendor source →

UpGuard experienced a critical incident on December 7, 2022 affecting Web App, lasting 3h 4m. The incident has been resolved; the full update timeline is below.

Started
Dec 07, 2022, 07:36 AM UTC
Resolved
Dec 07, 2022, 10:41 AM UTC
Duration
3h 4m
Detected by Pingoru
Dec 07, 2022, 07:36 AM UTC

Affected components

Web App

Update timeline

  1. investigating Dec 07, 2022, 07:36 AM UTC

    We are investigating an issue where many notifications are being generated. These notifications are invalid, and do not contain any information from other customer accounts, as the IPs that are generating the notifications are not linked to any customer or vendor within our platform. Notification generation has been halted at this point in time.

  2. investigating Dec 07, 2022, 07:37 AM UTC

    We are continuing to investigate this issue.

  3. investigating Dec 07, 2022, 07:47 AM UTC

    We are continuing to investigate this issue.

  4. identified Dec 07, 2022, 07:56 AM UTC

    The issue has been identified and a fix is being prepared. Notifications will remain halted until the fix is implemented and verified.

  5. identified Dec 07, 2022, 09:13 AM UTC

    A fix is being deployed for this issue.

  6. resolved Dec 07, 2022, 10:41 AM UTC

    A fix has been released to Production. We monitored for any further or additional issues, and the issue is now resolved. All invalid notifications have been purged from the platform, and the Notification process has been restarted.

  7. postmortem Dec 21, 2022, 02:58 AM UTC

    PIR Date: December 13th, 2022 Incident Date: December 7th, 2022 Incident Time: 03:10 UTC Incident Number: INCI-024 Severity Level: 2 Critical \(Single service affected, partial outage, multiple/all customers potentially affected\) Affected Services: UpGuard CyberRisk Notification Service Outage Duration: 3Hours 25Minutes # Incident Summary On Wednesday December 7th at 03:10 UTC UpGuard were first alerted to invalid notifications being sent for risks and vulnerabilities. On investigation, notifications were being sent for risks and vulnerabilities associated with orphaned domains and IP addresses. Notifications were paused during the investigation of the incident, and pending invalid notifications were purged. A full analysis and diagnosis were completed, and a full fix deployed as of December 7th at 09:40 UTC. Analysis shows <5% of UpGuard customers experienced the invalid notification. # Fault Some UpGuard customers were receiving multiple notifications relating to alerts for risks and vulnerabilities, the contents of these notifications didn't relate to their account or any domains or IPs owned by them. Customers with email notifications setup were receiving many hundreds of emails. # Detection The UpGuard Support team worked through several support tickets from customers which then were escalated as an incident to the product team on December 7th at 05:42 UTC. # Impact 1. Outage: UpGuard CyberRisk Notification Service was halted between 06:15 UTC and 09:40 UTC on December 7th 2. Notifications: Invalid notifications were being sent for some UpGuard customers for around 3hrs between 03:10 UTC and 06:15 UTC on December 7th. Analysis shows <5% of UpGuard customers experienced the invalid notification # Recovery 1. After an initial investigation, the UpGuard CyberRisk Notification Service was halted at 06:15 UTC 2. Further investigation found that the issue was a coding error, which was fixed, tested, and released into Production by 09:40 UTC 3. All pending notifications that were invalid were purged from the system by 09:40 UTC 4. The UpGuard CyberRisk Notification Service was restarted, and an analysis of the impacted customers was completed 5. Customer notifications were sent out to effected customers on December 7th # Timeline **December 7th 2023** 03:10 UTC - Initial customer ticket logged 04:52 UTC - Second customer ticket was logged 05:18 UTC - Third customer ticket was logged 05:42 UTC - Issues escalated to UpGuard product team from second customer ticket 06:15 UTC - UpGuard CyberRisk Notification Service shutdown by Product team during initial investigation 06:28 UTC - Product Incident Meeting Underway 07:04 UTC - Product Incident Meeting Continues 07:36 UTC - INITIAL Status Page Update - Notification of incident and advising that notifications have been halted 07:37 UTC - Status Page Update - Notification that investigations are still underway 07:47 UTC - Status Page Update - Notification that investigations are still underway 07:56 UTC - Status Page Update - Notification that the issue has been identified and a fix being prepared, confirmation that notifications are still paused 07:58 UTC - Fix being prepared; Status Page added to the product intercom widget to display updates 09:13 UTC - Status Page Update - Notification of a fix being deployed 09:40 UTC - UpGuard CyberRisk Notification Service restarted 10:41 UTC - FINAL Update on Status Page posted 10:48 UTC - Analysis of impacted customers completed # Root Cause Due to a coding error, certain fields in the production database were being set incorrectly. These fields affected the UpGuard CyberRisk Notification Service. # Corrective Actions 1. Purged any unread invalid notifications or pending notifications within the platform. 2. Analyzed the invalid notifications that were sent by UpGuard CyberRisk on the morning of December 7th, defined a list of affected customers and individual users and contacted the customers affected with a description of the impact and incident. 3. Implemented a Referential Integrity check on the Org to Vendor de normalized link, so that it cannot become invalid again. 4. Raised a ticket to investigate our ability to never read domains or IPs that are not linked to a vendor currently.