Investigating the Use of Machine Learning Methods in Direct Ridership Models for Bus Transit

Document Type

Journal Article

Publication Date


Subject Area

mode - bus, place - north america, place - urban, ridership - modelling


data and data science, machine learning (artificial intelligence), planning and analysis, demand estimation, ridership estimation modeling, decision tools


This test paper develops and tests 13 direct ridership models (DRMs) for transit sketch planning the Dallas–Fort Worth region. We explore both, machine learning modeling approaches (e.g., ridge regression and random forest) and traditional statistical models (e.g., linear regression and multiplicative regression). This effort provides a detailed description of modeling workflows and of the preprocessing of input data including general transit feed specification (GTFS), employment, socio-demographic, and ridership data. We also describe metrics to compare model performance; in our experiments the ridge regression framework using a Yeo-Johnson power transformation led to the most accurate predictions with an R2 of 0.88. The sensitivity of the DRM model to errors in the service-related predictor variables is within acceptable limits with the root mean squared error (RMSE) increasing by less than 20% for a 25% error in any one of the input predictors. Our findings suggest that DRMs can be a powerful complement to the four-step planning process, providing an alternative that is easier to maintain and run, and which may lead to more accurate ridership estimates given the limitations of transit modeling in traditional regional models. To illustrate the benefits of DRMs, this effort describes the deployment of trained models using a web-based framework which allows practitioners to obtain ridership estimates by drawing prospective routes on a map and providing a small number of service attributes as input.


Permission to publish the abstract has been given by SAGE, copyright remains with them.