ShipHawk incident

Investigating slow proposed shipment generation.

Notice Resolved View vendor source →

ShipHawk experienced a notice incident on August 8, 2022 affecting Shipping APIs, lasting 23h. The incident has been resolved; the full update timeline is below.

Started
Aug 08, 2022, 05:10 PM UTC
Resolved
Aug 09, 2022, 04:11 PM UTC
Duration
23h
Detected by Pingoru
Aug 08, 2022, 05:10 PM UTC

Affected components

Shipping APIs

Update timeline

  1. investigating Aug 08, 2022, 05:10 PM UTC

    Our monitoring system has identified some slowness when generating proposed shipments. Some customers may see a minor delay in the time it takes for a proposed shipment to generate when an order syncs to ShipHawk from their ERP. We are actively investigating this issue.

  2. identified Aug 08, 2022, 07:06 PM UTC

    The issue has been identified and we are working to resolve it. We estimate this issue will be solved within the next hour. Customer impact: Some customers have reported a short delay when syncing orders from their ERP.

  3. monitoring Aug 08, 2022, 07:48 PM UTC

    A fix is in place and being rolled out. Processing times will improve over the next 10-15 minutes. Customer impact: Some customers have reported a delay when syncing orders from their ERP.

  4. resolved Aug 09, 2022, 04:11 PM UTC

    This issue was resolved at 12:51 PM Pacific Time. Customer impact: Some customers have reported a delay when syncing orders from their ERP. Start time: 9:28 AM Pacific Time End time: 12:51 PM Pacific Time

  5. postmortem Aug 15, 2022, 09:58 PM UTC

    ## Incident summary Some of the ShipHawk NetSuite users experienced slowness in item fulfillments syncing between NetSuite and ShipHawk. The slowness was detected by the monitoring system at 9:28 AM Pacific Time, Monday 8/8, and continued till 12:51 PM Pacific Time. ## Impact Because of internal configuration changes, proposed shipment generation for large orders that had incomplete product information was done incorrectly and caused generation of a huge amount of packages. Processing of those proposed shipments took too much memory on background workers that were processing that queue. That, in turn, caused their unstable behavior and caused delays for all other item fulfillments processed in that queue. As a result, NetSuite Item Fulfillments were synchronizing to ShipHawk with a delay from 3 to 52 minutes. ## Detection and Recovery The incident was detected by ShipHawk monitoring system when the synchronization delay reached 3 minutes. The initial response was to scale processing power. Adding additional resources did not help as the new background job processors quickly became stuck for the same reason. The delay eventually increased and reached 52 minutes at its peak. At 12:30 PM we fixed the data of the products that were causing the issue and removed incorrectly generated proposed shipments. That unblocked the system and all the jobs that were waiting in the queue were processed within 21 minutes. The system returned to its normal state at 12:51 PM Pacific Time. ## Corrective actions In order to prevent that type of issue in the future, we plan to accomplish the following: 1. Develop a time-limiting system for background job processors, so a few slow jobs don’t block the entire queue. 2. Improve the UX to eliminate the ability to create product configurations that could cause unexpected behavior. 3. Add hard limitations to specific actions of the system, in order to reduce the risk of resource-abusive processes.