Scanii.com incident

Processing impaired across multiple regions

Notice Resolved View vendor source →

Scanii.com experienced a notice incident on October 20, 2025 affecting api-ap2.scanii.com and api-eu2.scanii.com and 1 more component, lasting 13h 49m. The incident has been resolved; the full update timeline is below.

Started
Oct 20, 2025, 09:05 AM UTC
Resolved
Oct 20, 2025, 10:55 PM UTC
Duration
13h 49m
Detected by Pingoru
Oct 20, 2025, 09:05 AM UTC

Affected components

api-ap2.scanii.comapi-eu2.scanii.comapi-ap1.scanii.comapi-eu1.scanii.comapi-us1.scanii.comapi-ca1.scanii.com

Update timeline

  1. investigating Oct 20, 2025, 01:05 PM UTC

    Processing is behind across multiple regions, we are investigating and will provide an update as soon as possible. At this point we believe this is fall out from the major AWS outage earlier

  2. investigating Oct 20, 2025, 01:06 PM UTC

    We are continuing to investigate this issue.

  3. investigating Oct 20, 2025, 01:07 PM UTC

    We are continuing to investigate this issue.

  4. identified Oct 20, 2025, 01:29 PM UTC

    The issue is primarily caused by SQS failing in us-east-1, that is unfortunately also impacting other regions that submit consumption data to us-east-1

  5. identified Oct 20, 2025, 01:44 PM UTC

    We are also unable to harvest compute capacity in us-east-1, so we're at AWS's mercy in order to fully recover. It looks like they are actively working to restore EC2

  6. identified Oct 20, 2025, 03:56 PM UTC

    We are continuing to work on a fix for this issue.

  7. monitoring Oct 20, 2025, 03:57 PM UTC

    Still ongoing

  8. monitoring Oct 20, 2025, 07:33 PM UTC

    We are now able to scale again and are processing through a large backlog of fetch/asynchronous work. Synchronous analysis calls should be mostly restored to normal - for US1 All other regions should be working normally now

  9. monitoring Oct 20, 2025, 07:53 PM UTC

    We are continuing to monitor for any further issues.

  10. resolved Oct 20, 2025, 10:55 PM UTC

    This incident has been resolved.

  11. postmortem Oct 27, 2025, 01:45 PM UTC

    # Postmortem 2025-10-20 Outage ### Amazon Service Disruption in the Northern Virginia \(US-EAST-1\) Region ### What happened **Impact** * **Duration:** 10.5 hours \(5:30 AM – 4:00 PM EDT\) * **Affected regions:** us1, eu1, eu2, ap1, ap2, ca1 Between 5:30 AM and 4:00 PM EDT on October 20 2025, Scanii services were globally unavailable or severely degraded. The following timeline summarizes key events \(all times EDT\): 1. **3:11 AM:** AWS Health reported increased error rates and latencies in us-east-1. 2. **4:26 AM:** AWS expanded the incident to include connectivity issues with DynamoDB. Because DynamoDB is foundational, multiple AWS services were also affected. 3. **5:01 AM:** AWS stated that the root cause had been identified and mitigated. 4. **5:30 AM:** Our synthetic monitoring detected that [www.scanii.com](https://www.scanii.com/) was down. The on-call engineer found ECS service capacity at zero, noted AWS recovery in progress, started new Fargate tasks, observed recovery, and stood down. 5. **6:35 AM:** AWS reported that EC2 launches were still experiencing elevated error rates, worsening the impact. 6. **9:00 AM:** Multiple customer support tickets indicated that the service was unavailable across regions. Despite no active alerts, we opened an incident and backdated it to the first alert. 7. **9:30 AM:** We traced the root cause to AWS SQS in us-east-1. The failure prevented job completion messages \(used for consumption tracking\) from other regions from reaching us-east-1, causing global instability. Our monitoring—recently migrated to AWS—was also affected, explaining the missing alerts. 8. **11:04 AM:** AWS reported ongoing network connectivity issues across multiple services. 9. **11:30 AM:** We considered declaring a disaster and invoking our DR plan to migrate workloads from us-east-1 to another U.S. region. Although the plan is tested twice a year, this would have been its first live execution and an irreversible migration. Given AWS’s ongoing progress and the historically short duration of their incidents, we decided against activating it. 10. **3:33 PM:** All regions except us1 had recovered, but us1 still lacked EC2 capacity to process the backlog. 11. **4:00 PM:** EC2 capacity recovered; we scaled up and achieved full service restoration. AWS later issued a [post-incident summary](https://aws.amazon.com/message/101925/) attributing the event to DNS resolution issues affecting DynamoDB. Scanii’s direct root cause was SQS unavailability in us-east-1, a shared dependency across all regions. ### Why it took 3.5 hours \(5:30 AM – 9:00 AM\) to detect multi-region impact 1. Failures in regions outside us-east-1 were intermittent and below alert thresholds. 2. Alerting in us-east-1 was impaired because synthetic monitors were hosted inside the affected region. ### Preventive actions Because this event represents the largest downtime in company history \(including the 2017 S3 outage\), we are implementing the following corrective measures: 1. **Backup monitoring outside AWS:** We are building an external monitoring infrastructure independent of AWS. Our previous Azure-based system was retired earlier this year when the product was discontinued, leaving us temporarily dependent on AWS for monitoring. 2. **Alerting audit:** We are reviewing and tightening CloudWatch monitors to better capture transient and partial failures. 3. **Cross-region architecture:** We are re-architecting our billing and consumption system so a failure in us-east-1 cannot affect other regions. This project will begin in Q1 2026. 4. **Regional relocation evaluation:** We are assessing a migration of selected services from us-east-1 to us-east-2 \(Ohio\). Historical evidence suggests us-east-1 experiences more frequent regional-scale incidents. Any migration will require planned downtime and validation of available capacity in the destination region. ### Customer communication We apologize for the disruption. Customers affected by this incident may request prorated credits under our SLA: [https://docs.scanii.com/article/141-sla](https://docs.scanii.com/article/141-sla). For questions, contact [[email protected]](mailto:[email protected]).