Big data in public transportation: a review of sources and methods

Document Type

Journal Article

Publication Date


Subject Area

literature review - literature review, ridership - behaviour, planning - methods, planning - service quality, technology - ticketing systems, technology - passenger information


Big data, public transportation, transport analysis, transit planning, planning methods, statistics


The collection of big data, as an alternative to traditional resource-intensive manual data collection approaches, has become significantly more feasible over the past decade. The availability of such data, coupled with more sophisticated predictive statistical techniques, has contributed to an increase in attention towards the application of these data, particularly for transportation analysis. Within the transportation literature, there is a growing emphasis on developing sources of commonly collected public transportation data into more powerful analytical tools. A commonly held belief is that application of big data to transportation problems will yield new insights previously unattainable through traditional transportation data sets. However, there exist many ambiguities related to what constitutes big data, the ethical implications of big data collection and application, and how to best utilize the emerging data sets. The existing literature exploring big data provides no clear and consistent definition. While the collection of big data has grown and its application in both research and practice continues to expand, there is a significant disparity between methods of analysis applied to such data. This paper summarizes the recent literature on sources of big data and commonly applied methods used in its application to public transportation problems. We assess predominant big data sources, most frequently studied topics, and methodologies employed. The literature suggests smart card and automated data are the two big data sources most frequently used by researchers to conduct public transit analyses. The studies reviewed indicate that big data has largely been used to understand transit users’ travel behavior and to assess public transit service quality. The techniques reported in the literature largely mirror those used with smaller data sets. The application of more advanced statistical methods, commonly associated with big data, has been limited to a small number of studies. In order to fully capture the value of big data, new approaches to analysis will be necessary.


Permission to publish the abstract has been given by Taylor&Francis, copyright remains with them.