Methods to improve the estimation of time-to-event outcomes when data is de-identified Academic Article uri icon

  •  
  • Overview
  •  
  • Research
  •  
  • Identity
  •  
  • Additional Document Info
  •  
  • View All
  •  

abstract

  • Technological advancements in recent years have sparked the use of large databases for research. The availability of these large databases has administered a need for anonymization and de-identification techniques, prior to publishing the data. This de-identification alters the data, which in turn can impact the results derived post de-identification and potentially lead to false conclusions. The objective of this study is to investigate if alterations to a de-identified time-to-event data set may improve the accuracy of the estimates. In this data set, a missing time bias was present among censored patients as a means to preserve patient confidentiality. This study investigates five methods intended to reduce the bias of time-to-event estimates. A simulation study was conducted to evaluate the effectiveness of each method in reducing bias. In situations where there was a large number of censored patients, the results of the simulation showed that Method 4 yielded the most accurate estimates. This method adjusted the survival times of censored patients by adding a random uniform component such that the modified survival time would occur within the final year of the study. Alternatively, when there was only a small number of censored patients, the method that did not alter the de-identified data set (Method 1) provided the most accurate estimates.

publication date

  • February 20, 2019