The Case of the Missing Data: Methods of Dealing with Dropouts and other Research Vagaries

Miss ing data are com mon in most stud ies, es pe cially when sub jects are fol lowed over time. This can jeop ard ize the va lid ity of a study be cause o f r e duced power to de tect dif ferences, and es pe cially be cause sub jects who are lost to fol low-up rarely rep re sent the group as a whole. There are sev eral ap proaches to han dling miss ing data, but some may re sult in bi ased es ti mates of the treat ment ef fect, and oth ers may over es ti mate the sig nifi cance o f the sta tis ti cal tests. When cross- sectional data (for ex am ple, demo graphic and back ground in for ma tion and a sin gle out come meas ure ment time) are miss ing, re place ment with the group mean leads to an un der es ti mate of the stan dard de via tion (SD) and in fla tion of the Type I er ror rate. Us ing re gres sion es ti mates, es pe cially with er ror built into the im put e d value, less ens but does not elimi nate this prob lem. Mul ti ple im pu ta tion p r e serves the es t imates of both the mean and the SD, even when a sig nifi cant pro por tion of the data are miss ing. With lon gi tu di nal stud ies, the last ob ser va tion car ried for ward (LOCF) ap proach pre serves the sam ple size, but may make un war ranted as sump tions about the miss ing data, re sult ing in ei ther un der es ti mat ing or over es ti mat ing the treat ment ef fects. Growth curve analy sis makes maxi mal use of the ex ist ing data and makes fewer as sump tions.

The Case of the Missing Data: Methods of Dealing with Dropouts and other Research Vagaries Journal Articles