Recovering the Association Between Unlinked Fare Machines and Stations Using Automated Fare Collection Data in Metro Systems

Document Type

Journal Article

Publication Date


Subject Area

mode - subway/metro, technology - ticketing systems


data and data science, artificial intelligence and advanced computing applications, machine learning (artificial intelligence), neural networks, supervised learning


Data quality is the foundation of data-driven applications in transportation. Data problems such as missing and invalid data could sharply reduce the performance of the methods used in these applications. Although there exist plenty of studies related to data quality issues, they only focus on missing or invalid data caused by infrastructure failures (e.g., loop detector malfunction). In general, there is a lack of attention to data quality issues from insufficient data management. This paper proposes a tensor decomposition based framework to tackle a specific missing data problem which occurs when the machine-station dictionary of an automated fare collection system database is incomplete. In such cases, there is a large amount of loss of origin/destination information as the affected machines are not linked to any station. Consequently, all associated transactions may miss the origin/destination information. The proposed framework recovers the dictionary by capturing features of the passenger flow passing through the unlinked fare machine. Evaluation results show that the proposed approach could recover the missing data with high accuracy even when several fare machines are not linked to a station. The framework could also support other beneficial applications.


Permission to publish the abstract has been given by SAGE, copyright remains with them.