Omnivore incident

CloudPOS Scheduler Queue Length

Major Resolved View vendor source →

Omnivore experienced a major incident on April 28, 2023 affecting API and Brink and 1 more component, lasting 1h 49m. The incident has been resolved; the full update timeline is below.

Started
Apr 28, 2023, 06:20 PM UTC
Resolved
Apr 28, 2023, 08:09 PM UTC
Duration
1h 49m
Detected by Pingoru
Apr 28, 2023, 06:20 PM UTC

Affected components

APIBrinkWebhooksAloha Cloud ConnectToastLavuLightspeed

Update timeline

  1. identified Apr 28, 2023, 06:20 PM UTC

    Around 18:00 UTC, we noticed that our CloudPOS Scheduler queue had an elevated number of tasks waiting to be run. This would likely cause all CloudPOS data to be stale, including Tickets and Clock Entries. It would also lead to delayed webhooks. We are currently scaling up our Scheduler Workers to process the delayed tasks.

  2. monitoring Apr 28, 2023, 07:02 PM UTC

    After scaling up our Scheduler Workers, the queue size has shrunk by ~75%. We will continue to monitor until the queue size is back to baseline.

  3. monitoring Apr 28, 2023, 07:56 PM UTC

    As of 19:41 UTC, the Scheduler Queue has returned to baseline. We have confirmed that POS data has been refreshed for all affected POS types (Brink, Toast, Cloud Connect, Lavu, and Lightspeed), including seeing current day Tickets. Webhooks have resumed as well. With the acute phase of the incident being over, we will check for any other impacts before closing the incident.

  4. resolved Apr 28, 2023, 08:09 PM UTC

    After further investigation, we see no other impacts to address. All systems appear to be fully operational.