Home
Scholarly Works
Evaluation Metrics for Deep Learning Imputation...
Conference

Evaluation Metrics for Deep Learning Imputation Models

Abstract

There is growing interest in imputing missing data in tabular datasets using deep learning. A commonly used metric in evaluating the performance of a deep learning-based imputation model is root mean square error (RMSE), which is a prediction evaluation metric. In this paper, we demonstrate the limitations of RMSE for evaluating deep learning-based imputation performance by conducting a comparative analysis between RMSE and alternative metrics in the statistical literature including qualitative, predictive accuracy, and statistical distance. To minimize model and dataset biases, we use two different deep learning imputation models (denoising autoencoders and generative adversarial nets) and a regression imputation model. We also use two tabular datasets with growing amounts of missing data from different industry sectors: healthcare and financial. Our results show that contrary to the commonly used RMSE metric, the statistical metric of Jensen Shannon distance best assessed the imputation models’ performance. The regression model also ranked higher than deep learning when evaluated using the Jensen Shannon metric.

Authors

Boursalie O; Samavi R; Doyle TE

Series

Studies in Computational Intelligence

Volume

1013

Pagination

pp. 309-322

Publisher

Springer Nature

Publication Date

January 1, 2022

DOI

10.1007/978-3-030-93080-6_22

Conference proceedings

Studies in Computational Intelligence

ISSN

1860-949X
View published work (Non-McMaster Users)

Contact the Experts team