Verisk incident

Resolved- ClaimSearch Anti-Fraud Service Disruption

Major Resolved View vendor source →

Verisk experienced a major incident on February 15, 2024 affecting ClaimSearch Match Report and Claims Reporting - Integrated Submission - FTP and 1 more component, lasting 8d 4h. The incident has been resolved; the full update timeline is below.

Started
Feb 15, 2024, 01:05 PM UTC
Resolved
Feb 23, 2024, 05:46 PM UTC
Duration
8d 4h
Detected by Pingoru
Feb 15, 2024, 01:05 PM UTC

Affected components

ClaimSearch Match ReportClaims Reporting - Integrated Submission - FTPClaims Reporting - Integrated Submission - XMLNICBClaim Scoring

Update timeline

  1. investigating Feb 15, 2024, 01:05 PM UTC

    Our team is currently investigating the issue affecting ClaimSearch Anti-Fraud Services . Rest assured, we're working diligently to restore normal service. This is impacting ClaimDirector, XML Throughput, and Visual ClaimSearch Match Reports

  2. investigating Feb 15, 2024, 01:32 PM UTC

    In addition to the above services - NICB Services and VIN Monitoring is also impacted

  3. investigating Feb 15, 2024, 01:32 PM UTC

    We are continuing to investigate this issue.

  4. investigating Feb 15, 2024, 02:02 PM UTC

    While we investigate the Anti-Fraud Service Disruption, we are committed to keeping you informed about our progress toward a solution. We apologize for the inconvenience.

  5. investigating Feb 15, 2024, 03:16 PM UTC

    We again apologize for the inconvenience. The Anti-Fraud issue is being investigated, and we're working towards a resolution.

  6. investigating Feb 15, 2024, 04:40 PM UTC

    We continue to actively investigate the ClaimSearch Anti-Fraud service disruption and appreciate your patience as we work to identify the root cause. Again we apologize for the inconvenience.

  7. investigating Feb 15, 2024, 06:32 PM UTC

    We continue to actively work the ClaimSearch Anti-Fraud service disruption. We will need to process through backlog once a fix is implemented. We apologize for the inconvenience and thank you for your patience.

  8. identified Feb 15, 2024, 08:11 PM UTC

    We have identified the cause of the disruption impacting ClaimSearch Anti-Fraud Services and are actively working on implementing a solution

  9. identified Feb 16, 2024, 01:19 AM UTC

    We have implemented a fix for the partial outage which impacted ClaimSearch Anti-Fraud Services. We have back-log to process and expect to process the bulk of the back-log overnight. Thank you for your patience while we work through this issue. This will be the last update this evening - we will update again in the morning.

  10. monitoring Feb 16, 2024, 02:03 PM UTC

    Thank you for your patience as we continue to work through several issues which have impacted system performance. Many of the processes have returned to normal, however, there could still be residual delays in receiving match reports due to system backlogs, especially for claims that run through the claims scoring processes. Although there may still be delays, no claims have been lost and there will be no action required by customers; all claims will be processed once the queues are cleared. Once all issues are resolved and the root cause is determined, we will share that information. We apologize for any impact to your claim processing and are working diligently to resolve all issues.

  11. monitoring Feb 16, 2024, 09:09 PM UTC

    We are pleased to report that we have implemented a workaround for the claims that run through the claims scoring process. We are currently processing through the backlog. We will be monitoring the progress through the weekend and will provide updates as warranted. Thank you again for your patience as we worked through this issue.

  12. monitoring Feb 20, 2024, 02:55 PM UTC

    The Service disruption for Verisk Anti-Fraud ClaimSearch is now resolved. Systems were monitored through the weekend. We are scheduling a retrospective and will post the findings on this page once that process is completed. We apologize for the inconvenience.

  13. resolved Feb 23, 2024, 05:46 PM UTC

    This incident has been resolved.

  14. postmortem Mar 01, 2024, 04:10 PM UTC

    **TIMING:** February 14, 4:13 PM ET to February 16, 10:57 AM ET **DESCRIPTION:** ClaimSearch Customers were unable to log in to ClaimSearch services. **IMPACT:** Claim Director was unavailable to customers for Thursday, Feb 15 and Friday, Feb 16. The outage spread to NICB Services, Visual Platform, and caused processing delays due to high queue depth in System-to-System interfaces \(XML, FTP, Web\). **ROOT CAUSE:** On Wednesday, February 14th, ClaimDirector's scoring queues started alerting in the late afternoon. By February 15th, the major outage occurred due to the high database load. The DBA team identified that the issue was caused by insufficient statistics gathering on the involved party table and table's growth over time, which led to bad query plans and performance degradation in the database. This resulted in Claim Director was unavailable, and issues with NICB Services, Visual Platform, and caused processing delays due to high queue depth in System-to-System interfaces \(XML, FTP, Web\). **CORRECTIVE ACTION:** · The DBAs ran vacuum on impacted tables. · ClaimDirector tasks were brought down since it was determined that these tasks were causing unusually high DB load. · The Engineering and DBA teams implemented tuned queries to improve the database performance. · The Engineering teams implemented a temporary fix to disable tokenization in ClaimDirector and enabled the ClaimDirector tasks. · The DBAs increased the reader nodes in the postgress database to process the backlog in the queues. **PREVENTATIVE MEASURES:** 1. Increase the value for the column from 100 to 1000 and reanalyze the table in production. This will proactively set the statistics target for any tables over a certain threshold. 2. Create an SOP for query performance: start with vacuum analyzing the table\(s\), if that doesn't improve, then adjust default\_statistics\_target to a higher value during the session and re-analyze the table\(s\). 3. Look into baselining queries and alerting on performance degradation. 4. Implement tooling that can help with diagnosing and troubleshooting Postgres related issues. 5. Create SOP for vacuum and vacuum analyze. 6. Move the large tables from OLTP to a Data Warehouse or Data Store.