Evaluation Metrics for Deep Learning Imputation...

Evaluation Metrics for Deep Learning Imputation Models

Abstract

There is growing interest in imputing missing data in tabular datasets using deep learning. A commonly used metric in evaluating the performance of a deep learning-based imputation model is root mean square error (RMSE), which is a prediction evaluation metric. In this paper, we demonstrate the limitations of RMSE for evaluating deep learning-based imputation performance by conducting a comparative analysis between RMSE and alternative metrics in the statistical literature including qualitative, predictive accuracy, and statistical distance. To minimize model and dataset biases, we use two different deep learning imputation models (denoising autoencoders and generative adversarial nets) and a regression imputation model. We also use two tabular datasets with growing amounts of missing data from different industry sectors: healthcare and financial. Our results show that contrary to the commonly used RMSE metric, the statistical metric of Jensen Shannon distance best assessed the imputation models’ performance. The regression model also ranked higher than deep learning when evaluated using the Jensen Shannon metric.