Machine learning for detecting centre-level...

Machine learning for detecting centre-level irregularities in randomized controlled trials: A pilot study

Abstract

Centralized statistical monitoring is sometimes employed as an alternative to onsite monitoring for randomized control trials. Current central monitoring methods have limitations, in that they are relatively resource intensive and do not necessarily generalize to studies where an irregularity pattern has not been observed before. Machine learning has been effective in detecting irregularities in industries such as finance and manufacturing, but to date none have been applied to clinical trials. We conducted a pilot study for the use of machine learning to identify center-level irregularities in data from multicenter clinical trials. We employed unsupervised machine learning methods, which do not rely on labelled data, and therefore allow for the automated discovery of previously unseen irregularity patterns while maintaining flexibility when applied to new data with different structures. This pilot study employs unsupervised machine learning to compute distance matrices between centres, which we used to produce centre-level continuous features. We then used a one-class support vector machine to learn the underlying distribution of each data set to identify data that was substantially different from these distributions. We evaluated our approach against current automatable centralized monitoring methods on two trials with known irregularities. While current approaches performed well on one trial (AUROC 0.752 for monitoring vs. 0.584 for machine learning), our techniques performed substantially better on the other (AUROC 0.140 for monitoring vs 0.728 for machine learning). The results of this pilot study suggest both the feasibility and the potential value of a machine learning-based approach to irregularity detection in RCTs.