A classification of public transit users with smart card data based on time series distance metrics and a hierarchical clustering method

Document Type

Journal Article

Publication Date


Subject Area

place - north america, ridership - behaviour, ridership - demand, technology - passenger information, technology - ticketing systems


Public transportation data, smart card users’ behavior, time-series classification, cross-correlation, dynamic time warping


A classification of the behavior of smart card users is important in the field of public transit demand analysis. It provides an understanding of people’s sequence of activities within a period of time. However, classical metrics such as Euclidean distance is not appropriate when dealing with time-series classification. To solve this problem, in this article a method for the classification of public transit smart card users’ daily transactions, which are represented in time series, is presented. The chosen approach uses cross-correlation distance (CCD), hierarchical clustering, and subgroups by metric parameter to understand the users’ temporal patterns. The clustering results are compared with dynamic time warping (DTW) distance (a common method to measure time-series distance). After a brief pedagogical example to explain the DTW and CCD concepts, a program is developed in R to validate the method on a real dataset of smart card data transactions. The dataset concerns the use of the public transit system in the city of Gatineau in September 2013. The results demonstrate that CCD performs better than DTW to classify the time series, and that the classification method identifies different public transit users’ daily behaviors. The results will help transit authorities to offer better services for smart card users from diverse groups.


Permission to publish the abstract has been given by Taylor&Francis, copyright remains with them.