Estimating metro passengers’ path choices by combining self-reported revealed preference and smart card data

Document Type

Journal Article

Publication Date


Subject Area

place - asia, mode - subway/metro, technology - passenger information, technology - ticketing systems, ridership - behaviour, planning - surveys, ridership - modelling


Data fusion, Expectation-Maximization algorithm, Metro network, Stochastic travel time budget, Risk-averse attitude


With the help of automated fare collection systems in the metro network, more and more smart card (SC) data has been widely accumulated, which includes abundant information (i.e., Big Data). However, its inability to record passengers’ transfer information and factors affecting passengers’ travel behaviors (e.g., socio-demographics) limits further potential applications. In contrast, self-reported Revealed Preference (RP) data can be collected via questionnaire surveys to include those factors; however, its sample size is usually very small in comparison to SC data. The purpose of this study is to propose a new set of approaches of estimating metro passengers’ path choices by combining self-reported RP and SC data. These approaches have the following attractive features. The most important feature is to jointly estimate these two data sets based on a nested model structure with a balance parameter by accommodating different scales of the two data sets. The second feature is that a path choice model is built to incorporate stochastic travel time budget and latent individual risk-averse attitude toward travel time variations, where the former is derived from the latter and the latter is further represented based on a latent variable model with observed individual socio-demographics. The third feature is that an algorithm of combining the two types of data is developed by integrating an Expectation-Maximization algorithm and a nested logit model estimation method. The above-proposed approaches are examined based on data from Guangzhou Metro, China. The results show the superiority of combined data over single data source in terms of both estimation and forecasting performance.


Permission to publish the abstract has been given by Elsevier, copyright remains with them.


Transportation Research Part C Home Page: