Machine Learning with Statistical Imputation for Predicting Drug Approvals

Lo, Andrew W., Kien Wei Siah, and Chi Heem Wong, 2019, Harvard Data Science Review, https://doi.org/10.1162/99608f92.5c5f0525.

ABSTRACT We apply machine-learning techniques to predict drug approvals using drug-development and clinical-trial data from 2003 to 2015 involving several thousand drug-indication pairs with over 140 features across 15 disease groups. To deal with missing data, we use imputation methods that allow us to fully exploit the entire dataset, the largest of its kind. We show that our approach outperforms complete-case analysis, which typically yields biased inferences. We achieve predictive measures of 0.78, and 0.81 AUC (“area under the receiver operating characteristic curve,” the estimated probability that a classifier will rank a positive outcome higher than a negative outcome) for predicting transitions from phase 2 to approval and phase 3 to approval, respectively. Using five-year rolling windows, we document an increasing trend in the predictive power of these models, a consequence of improving data quality and quantity. The most important features for predicting success are trial outcomes, trial status, trial accrual rates, duration, prior approval for another indication, and sponsor track records. We provide estimates of the probability of success for all drugs in the current pipeline.

Available Here

More Research >>