Verisk incident
ClaimSearch Services - Login issues and issues accessing services
Verisk experienced a minor incident on June 13, 2023 affecting ClaimSearch Website and ClaimSearch Match Report and 1 more component, lasting 4h 13m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jun 13, 2023, 07:22 PM UTC
There is currently an outage with our Service provider. Multiple ClaimSearch Services are impacted
- monitoring Jun 13, 2023, 09:09 PM UTC
Some services are returning to normal. For XML there will be delays as we process through the backlog. We apologize for any inconvenience.
- resolved Jun 13, 2023, 11:35 PM UTC
All backlog has been processed and all services have returned to Normal. We apologize for any inconvenience
- postmortem Aug 14, 2023, 01:25 PM UTC
**DESCRIPTION:** ClaimSearch Services - Login issues and issues accessing services **IMPACT:** Customer Impact: Customers were unable to access most of the ClaimSearch Services. Incident date : Jun 13, 2023, 2:49 PM ET Resolution date : Jun 13, 2023, 5:00 PM ET **ROOT CAUSE:** The AWS Eastern Region had an Outage:- On June 13 at 2.49 PM EST, All the Claim Search applications experienced issues and were inaccessible. AWS experienced increased error rates and latency for the Lambda function invocations within the US-EAST region. This was due to a latent software defect in the software subsystem of the AWS Lambda responsible for managing compute capacity to process incoming invocations for Lambda functions, which caused invocations to fail. Upon investigation by the AWS team, it was due to the latent software defect, triggered by the scaling of the Lambda front-end fleet. **CORRECTIVE ACTION:** AWS Corrective Action - Once the traffic subsided, the lambda front-end fleet was scaled down to resolve the issue. Verisk Corrective Action - Began failover to US-West. This was also slow, most-likely due to other AWS customers failing over from east to west. AWS East returned on its own. **PREVENTATIVE MEASURES:** 1. Implement HA \(High-Availability\) pair and conduct regular failover testing. Develop synchronization for East and West in multi-region deployments. 2. Explore running Login service as hot-hot, and Claims Inquiry and Visual ClaimSearch as hot/warm. 3. Create a run book document for bringing up US-West and flipping back to US-East. 4. Check the feasibility of using Dynamo Database vs. PostgreSQL for DR Failover Capability.