OpenTrainTimes incident

Maps not updating

Major Resolved View vendor source →

OpenTrainTimes experienced a major incident on November 5, 2023 affecting Train movement data and Map data, lasting 12h 3m. The incident has been resolved; the full update timeline is below.

Started
Nov 05, 2023, 12:10 AM UTC
Resolved
Nov 05, 2023, 12:13 PM UTC
Duration
12h 3m
Detected by Pingoru
Nov 05, 2023, 12:10 AM UTC

Affected components

Train movement dataMap data

Update timeline

  1. identified Nov 05, 2023, 11:39 AM UTC

    Due to a problem with one of our internal servers overnight, maps and real-time information is not currently updated. Engineers have identified the problem and are working on a fix.

  2. monitoring Nov 05, 2023, 11:45 AM UTC

    We have implemented a fix and are monitoring the availability of the site. Maps may be showing out-of-date information until another train passes through each signal berths.

  3. resolved Nov 05, 2023, 12:13 PM UTC

    This incident has been resolved

  4. postmortem Nov 05, 2023, 12:14 PM UTC

    Last night, one of the servers that handles our incoming messages from Network Rail and other suppliers failed due to a lack of disk space. This brought down the services that distributes messages to other internal servers, meaning maps and train running information was out-of-date on the website. We resolved this problem by freeing up disk space on the server and restarting the necessary services. The reason this happened was straightforward. Each month, we archive off the logs from our messaging servers to offline storage. We do this on both our live and backup servers, and clear out the old data from the servers once it has been successfully verified to be copied and complete. Last month, this process failed in a subtle way and messages on one of the servers were archived but not deleted from the server, meaning the disk space used was not freed up. Last night, the disk on the server filled up. This archiving process has been only failed once before and the underlying cause fixed. This time was a process issue, not a technical issue, and we will be taking steps to ensure it doesn’t happen again.