Derivation of train arrival timings through correlations from individual passenger farecard data

Document Type

Journal Article

Publication Date


Subject Area

mode - subway/metro, place - asia, place - urban, technology - passenger information, technology - ticketing systems, planning - methods, ridership - commuting, operations - crowding


Rapid transit system, Metro, Train logs, Smart card data, Overcrowding


In this paper, we propose a method for estimating the timings at which trains arrive and depart from stations using passenger farecard data and knowledge of the network topology. The problem we consider is essential for understanding commuter movement patterns across metro systems at high granular detail in settings where one does not have access to train logs (comprising records of train arrival and departure timings) or when these records are unreliable. Our technique requires as input the timings at which passengers arrive and depart from station—these are easily retrievable from farecard data—and provide as output an estimate of the number of trains running as well as the timings at which each train arrives and departs at each station. Our method relies on two key observations: (1) passengers tend to exit metro stations as soon as they alight and (2) we can reliably conclude that groups of passengers who board at the same stop but alight at different stops were on the same train if their boarding timings have similar distributions. In contrast with prior works, our methodology is stand-alone in that it does not rely on external sources of information such as train schedules and it requires minimal parameter tuning. In addition, because a by-product of our method is that we infer the trains for which passengers board, our techniques can be employed as a pre-processing step for downstream tasks such as inferring passenger route choices. We apply our method to recover train logs using synthetically generated data as well as actual ticketing data of passengers in the Singapore metro network. Experiments on synthetic data show that our method reliably recovers train logs even with moderate levels of overcrowding on train platforms.


Permission to publish the abstract has been given by SpringerLink, copyright remains with them.