The Case of the Missing Data: Methods of Dealing...

The Case of the Missing Data: Methods of Dealing with Dropouts and other Research Vagaries

Abstract

Missing data are common in most studies, especially when subjects are followed over time. This can jeopardize the validity of a study because of reduced power to detect differences, and especially because subjects who are lost to follow-up rarely represent the group as a whole. There are several approaches to handling missing data, but some may result in biased estimates of the treatment effect, and others may overestimate the significance of the statistical tests. When cross-sectional data (for example, demographic and background information and a single outcome measurement time) are missing, replacement with the group mean leads to an underestimate of the standard deviation (SD) and inflation of the Type I error rate. Using regression estimates, especially with error built into the imputed value, lessens but does not eliminate this problem. Multiple imputation preserves the estimates of both the mean and the SD, even when a significant proportion of the data are missing. With longitudinal studies, the last observation carried forward (LOCF) approach preserves the sample size, but may make unwarranted assumptions about the missing data, resulting in either underestimating or overestimating the treatment effects. Growth curve analysis makes maximal use of the existing data and makes fewer assumptions.

Authors

Streiner DVL

Journal

The Canadian Journal of Psychiatry, Vol. 47, No. 1, pp. 68–75

Publisher

SAGE Publications

Publication Date

January 1, 2002

DOI

10.1177/070674370204700111

ISSN

0706-7437

Associated Experts

David Lloyd Streiner

Professor Emeritus, Faculty of Health Sciences

Visit profile

Labels