Abstract
The performance of predictive maintenance models is usually assessed using validation-based performance metrics, but such assessments are prone to data leakage and extreme base-rate shifts when transitioning from the development to the deployment setting. In this paper, we focus on the challenge of horizon-based robust failure prediction in realistic deployment settings using the CMAPSS FD001 dataset. We present a leakage-resilient and safety-aware predictive maintenance system with a formal mathematical formulation for safety-constrained threshold selection, including explicit equations for threshold feasibility under constraints on minimum recall, maximum false positive rate, and bounded predicted positive rate. The impact of base-rate shift is quantified: for example, while validation failure rates are around 15%, deployment failure rates drop below 3%, highlighting the potential for false-alarm explosions if not addressed. To avoid optimistic bias, the proposed method strictly adheres to unit-level data partitioning and derives temporal features from past-only rolling statistics, thereby avoiding the use of future information during training and evaluation. For decision support, we develop a safety-constrained threshold selection method that simultaneously satisfies constraints on minimum recall, maximum false-positive rate, and a bounded predicted positive rate on the validation set, thereby alleviating the problem of false-alarm explosion at extremely low failure rates. Notably, threshold selection and model selection are decoupled, and candidate models are compared using a robustness score that accounts for both detection performance and false-alarm rates. Experimental data on the CMAPSS FD001 benchmark set show that performance on the validation set is insufficient to distinguish trustworthy models in the context of base-rate shift. We additionally report PR-AUC and recall for both validation and test sets to provide a more comprehensive performance evaluation. Our framework achieves a test F1-score that is 36.3% higher than SVM and 201.5% higher than LightGBM, respectively. In addition, we have observed that our framework significantly reduces the false positive rate by up to 92.6%.
Authors
Muhammad Rashid Majeed1, Mst Jannatul Kobra2, Kashif Iqbal3, Nabi Rehmat4, Kaleem Ullah5
Nanjing University of Information Science and Technology, China1,2,3,4,5
Keywords
Predictive Maintenance, Base-Rate Shift, Safety-Aware Thresholding, Data Leakage Prevention, Failure Prediction