Entry-Only Automated Fare-Collection System Data Used to Infer Ridership, Rider Destinations, Unlinked Trips, and Passenger Miles

Document Type

Journal Article

Publication Date


Subject Area

operations - scheduling, land use - planning, mode - mass transit


Unlinked passenger trips, Transit entry-only data, Transit, Statistical sampling, Shortest path algorithms, Service planning, Scheduling, Sampling (Statistics), Ridership, Public transit, Patronage (Transit ridership), Passenger miles, Origin and destination, O&D, New York City Transit Authority, New York City Transit, Mass transit, Local transit, Consumption rates, Automation, Automatic fare collection, Automated control systems, Algorithms


All U.S. transit agencies receiving FTA Urbanized Area Formula Program funding under Section 5307 (Section 15) report service consumption statistics (revenue passenger miles and unlinked trips) to the National Transit Database. Passenger miles is an incentive-based funding element that generates millions of dollars annually for New York City Transit (NYCT). Originally, Section 15 random sample data were collected by surveyors gathering passenger destination information, followed by manual distance calculation based on judgment of likely travel paths. This method was costly, inefficient, inconsistent, and not always reproducible despite rigorous auditing and certification. NYCT modernized this process by directly retrieving passenger-origination information from the automated fare-collection (AFC) system, inferring destinations with a second swipe, and automating passenger mile calculation by using schedule-driven shortest-path algorithms. While using state-of-the-art data collection and computation methods, NYCT retained FTA-approved sampling methodology to maintain comparability of data. Success of automated data reporting is maximized by developing algorithms first by using small data sets, followed by clearly documented parallel testing with full involvement of data consumers, including relevant regulatory authorities. Software development is iterative, and computation time should be monitored to ensure scalability. Building on this work, NYCT is developing AFC-based methodologies to infer bus passenger origins and destinations and train loads and it is adapting signal-system data for routine monitoring of operating performance. Reporting automation to the extent at which live data can be used for scheduling and service planning without modeling or special analyses will allow service to be monitored much more frequently and extensively.