Redox incident

Customer Filter Errors

Redox experienced a minor incident on January 22, 2025 affecting Dashboard Tools, lasting 2h 25m. The incident has been resolved; the full update timeline is below.

Started: Jan 22, 2025, 10:30 PM UTC
Resolved: Jan 23, 2025, 12:56 AM UTC
Duration: 2h 25m
Detected by Pingoru: Jan 22, 2025, 10:30 PM UTC

Affected components

Dashboard Tools

Update timeline

investigating Jan 22, 2025, 10:30 PM UTC

We are aware of an issue affecting customer message filters. If you suspect your messages are being unexpectedly filtered, please reach out to Production Support here: https://redoxengine.atlassian.net/servicedesk/customer/portal/12
monitoring Jan 22, 2025, 11:25 PM UTC

A fix has been implemented and we are working on replaying affected messages
resolved Jan 23, 2025, 12:56 AM UTC

Emails will be sent to affected customers to determine replay eligibility tomorrow morning. Please keep an eye on your alert inbox for updates
postmortem Jan 28, 2025, 03:17 PM UTC

## Summary * On January 22, 2025 from ~0900CT to 1650CT some filters that had been previously deleted were reactivated, which may have caused some filter logic to execute in unexpected ways. Message processing was otherwise unaffected. * Secondarily, on January 23, 2025 we identified that for a subset of customers, messages that should have been filtered via Redox logic, were not. No traffic was sent to you that shouldn’t have been sent, but your customer filter configuration may have affected what messages you received or filtered out during this time. This did not affect who receives your message. * All impacted customers were notified directly and offered replay assistance as well as support for assessing further impact. ## What Happened * On January 22 an initial storage system script was executed. This script inadvertently reactivated some previously deleted filters, causing these previously deleted filters to run against subscription traffic during the time period. * On January 22 we were notified of unexpected filters running against traffic. After detection of the inadvertent reactivation, a second script was executed to re-delete the applicable filters. At this point unexpected filtering was mitigated. * On January 23 we were notified that the secondary script, while mitigating the original issue, had caused a secondary issue. A subset of filters which should have been filtering traffic were no longer doing so. A tertiary script was executed to mitigate the secondary issue. * This sequence of events created the following possible impacts, for a subset of customers * On January 22 some traffic which should not have been filtered was. * On January 23 some traffic which should have been filtered instead was not. * On both January 22 and January 23 impacted customers were contacted and assistance was offered to remediate. * Replays were performed, if requested, for the January 22 filtering. ## What we are doing about this: * We are implementing improved alerting to proactively notify us of misaligned expectations in our underlying storage system. * We are considering improvements to our internal tooling for interacting with resources in our underlying storage system.