Modeling dwell time in a data-rich railway environment: With operations and passenger flows data

Document Type

Journal Article

Publication Date


Subject Area

mode - rail, operations - scheduling


Dwell time, Timetables, Modeling, Passenger flows, Machine-learning methods (linear regression, random forests, gradient boosting with trees)


We model dwell times for trains subject to a possibly dense timetable based on a rich data set containing both railway operations variables and passenger flows variables, which is rare in the literature. Another distinguishing feature of our modeling consists of building a single statistical model for actual dwell times at all stations and in all contexts, not just in constrained situations like late arrivals or not just for some minimum dwell time. These models are fully data-driven and stem from either linear regressions with multiplicative effects or machine-learning methods like random forests, both carefully tuned on training data sets. While railway operations variables remain key for the modeling of dwell time, we are able to characterize the added value of passenger flows variables. Overall, they lead to an average reduction of the global modeling error by about 0.5 s, with up to 5 s–10 s average improvements in challenging situations consisting, e.g., of late arrivals or associated with high passenger affluence. We also study which are the most influential variables among the available operations and passenger flows variables, and we do so globally and by regime of punctuality: for instance, passenger flows variables, and in particular, the passenger affluence at the critical door, are the most influential variables for trains suffering a late arrival, while the scheduled dwell time and the deviation to the scheduled arrival time are the most important variables for early trains.


Permission to publish the abstract has been given by Elsevier, copyright remains with them.


Transportation Research Part C Home Page: