BCycle incident

Rolling Unresponsive Kiosks

Minor Resolved View vendor source →

BCycle experienced a minor incident on September 8, 2023 affecting IoT Hub and Service Fabric and 1 more component, lasting 12d 18h. The incident has been resolved; the full update timeline is below.

Started
Sep 08, 2023, 08:50 PM UTC
Resolved
Sep 21, 2023, 03:24 PM UTC
Duration
12d 18h
Detected by Pingoru
Sep 08, 2023, 08:50 PM UTC

Affected components

IoT HubService FabricStream Analytics3.0 Dock Monitor

Update timeline

  1. monitoring Sep 08, 2023, 08:50 PM UTC

    We've been made aware of reports of kiosks going into an unresponsive state - We're currently looking into these reports and assessing the scope and cause. We appreciate your patience while we work through this issue.

  2. investigating Sep 08, 2023, 10:02 PM UTC

    We've been made aware of reports of kiosks going into an unresponsive state - We're currently looking into these reports and assessing the scope and cause. We appreciate your patience while we work through this issue.

  3. identified Sep 09, 2023, 01:12 AM UTC

    The issue has been identified and an update is being readied for deployment. No downtime is expected for this rolling update.

  4. investigating Sep 09, 2023, 04:37 AM UTC

    Our fix did not resolve the issue as intended; we are continuing to monitor the system while working on a new solution. We appreciate your patience as we work to resolve this outage.

  5. investigating Sep 09, 2023, 03:13 PM UTC

    In the past 18 hours, many kiosks in BCycle systems have been listed as Unresponsive in the BCycle Admin maps and apps. This listing as Unresponsive is a side effect of delays in the processing of select kiosk messages. In particular, the kiosk sensor readings (“pings”) are being processed slowly. However, checkouts and returns can be processed at these kiosks at regular speeds. Unfortunately, app-based checkouts depend on kiosks being listed as Active, not Unresponsive. As a temporary measure, we are marking kiosks as Active if they are currently full-service kiosks that are visible to the public. This measure will support the rider in your systems as we continue to work to address the root cause of the kiosk message processing delays. As an operator, you can continue to monitor kiosk connectivity in the Kiosk & Station Maintenance grid in BCycle Admin; if you see the Sensor Reading Timestamp lagging behind the current 15-minute window, but advancing in time (i.e. it appears consistently, say, 45 minutes behind the current time), then that kiosk is affected by the delays. Kiosks that stop reporting for longer may in fact be disconnected and require a visit.

  6. investigating Sep 11, 2023, 04:34 PM UTC

    Our software team continues to work toward a permanent solution while the temporary measure of setting full-service kiosks to Active remains in place. At this time, a subset of kiosks is still affected by the degraded system performance. These kiosks are managing checkouts and returns without issue, but the kiosk sensor readings are being processed in a separate communication channel, which is seeing significant delays. These delays mean that the Kiosk & Station Maintenance grid is updating the Sensor Reading Timestamps as well as the Battery Voltages on a delay. In the short term, operators can verify that a kiosk is connected if the Sensor Reading Timestamp is moving forward, even though it is behind the current time. Thank you for your patience as we work toward a complete fix.

  7. investigating Sep 11, 2023, 08:05 PM UTC

    As part of our ongoing investigation, we will deploy an update to ServiceFabric this evening to add logging to assist our analysis. This deployment will not cause any downtime. Thank you for your continued patience with this outage.

  8. monitoring Sep 14, 2023, 03:37 PM UTC

    The BCycle team has completed backend system maintenance this week and implemented changes last night to manage kiosk heartbeat messages faster. These changes are allowing kiosk heartbeat messages to be processed at regular speeds and we are no longer seeing these kiosk sensor readings arrive on a delay. At 11am, we are reactivating the function that automatically sets kiosks to Active or Unresponsive status, based on the kiosk sensor readings. Operations teams will again see kiosk connection status automatically update in the maps and apps. Thank you for your collaboration and patience as we worked to diagnose and address this issue.

  9. resolved Sep 21, 2023, 03:24 PM UTC

    This incident has been resolved.