Network–wide prediction of public transportation ridership using spatio–temporal link–level information

Document Type

Journal Article

Publication Date


Subject Area

place - europe, place - urban, mode - bus, mode - tram/light rail, technology - passenger information, ridership - behaviour, ridership - modelling


Public transportation, Ridership, Prediction, Inference, Machine learning


Public transportation is a key element to vivid city life. Understanding the dynamics and driving forces of public transportation ridership can be a very rewarding task. It is, however, a highly complex construct. In this research, we focus on a spatial viewpoint, which has seen little attention: the link level. It represents the trip of a vehicle between directly connected stations. Additionally, we put emphasis on the impact of exogenous events. In order to assess their spatio–temporal influences, a temporal resolution of 30 min complements the spatial link level. Ridership data for trams and buses is provided by Stadtwerke München (SWM), which is the operator of the public transportation network in Munich, Germany, including 82 bus and 17 tram lines. About 30% of trams and 50% of buses are equipped with automatic passenger counting sensors, which capture boarding and alighting at each individual station. The equipped vehicles are strategically placed by SWM to obtain a meaningful view on the whole system. The raw sensor data is cleaned and sanitized. The data we are using spans a 4–year period (2014–2017). Following a pre–processing step, ∼59.79% of the data is considered, which equates to ∼97 million observations. There are 693 tram links and 2944 bus links, which makes 3637 links in total. We distinguish the analysis in ridership prediction and inference. For prediction, we specify one model functional form and build this model for each link, using 5–fold cross–validation to avoid overfitting. We employ decision trees, combining them with bagging and boosting. We then perform inference, i.e. attempt to understand the relationship between the variables that emerged in the predictive models. Ridership is assessed for each link separately and visualized together in order to construct network views and maps. Conclusions are drawn, and recommendations for future research are formulated.


Permission to publish the abstract has been given by Elsevier, copyright remains with them.


Journal of Transport Geography home Page: