Validation of automatic passenger counting: introducing the t-test-induced equivalence test

Document Type

Journal Article

Publication Date


Subject Area

planning - methods, technology - passenger information


Automatic passenger counting APC validation, APC accuracy, Revenue sharing, Equivalence testing, Post-hoc power adaptions


Automatic passenger counting (APC) in public transport has been introduced in the 1970s and has been rapidly emerging in recent years. Still, real-world applications continue to face events that are difficult to classify. The induced imprecision needs to be handled as statistical noise and thus methods have been defined to ensure that measurement errors do not exceed certain bounds. Various recommendations for such an APC validation have been made to establish criteria that limit the bias and the variability of the measurement errors. In those works, the misinterpretation of non-significance in statistical hypothesis tests for the detection of differences (e.g. Student’s t-test) proves to be prevalent, although existing methods which were developed under the term equivalence testing in biostatistics (i.e. bioequivalence trials, Schuirmann in J Pharmacokinet Pharmacodyn 15(6):657–680, 1987) would be appropriate instead. This heavily affects the calibration and validation process of APC systems and has been the reason for unexpected results when the sample sizes were not suitably chosen: Large sample sizes were assumed to improve the assessment of systematic measurement errors of the devices from a user’s perspective as well as from a manufacturers perspective, but the regular t-test fails to achieve that. We introduce a variant of the t-test, the revised t-test, which addresses both type I and type II errors appropriately and allows a comprehensible transition from the long-established t-test in a widely used industrial recommendation. This test is appealing, but still it is susceptible to numerical instability. Finally, we analytically reformulate it as a numerically stable equivalence test, which is thus easier to use. Our results therefore allow to induce an equivalence test from a t-test and increase the comparability of both tests, especially for decision makers.


Permission to publish the abstract has been given by SpringerLink, copyright remains with them.