Methods to improve the estimation of time‐to‐event outcomes when data is de‐identified

Technological advancements in recent years have sparked the use of large databases for research. The availability of these large databases has administered a need for anonymization and de‐identification techniques, prior to publishing the data. This de‐identification alters the data, which in turn can impact the results derived post de‐identification and potentially lead to false conclusions. The objective of this study is to investigate if alterations to a de‐identified time‐to‐event data set may improve the accuracy of the estimates. In this data set, a missing time bias was present among censored patients as a means to preserve patient confidentiality. This study investigates five methods intended to reduce the bias of time‐to‐event estimates. A simulation study was conducted to evaluate the effectiveness of each method in reducing bias. In situations where there was a large number of censored patients, the results of the simulation showed that Method 4 yielded the most accurate estimates. This method adjusted the survival times of censored patients by adding a random uniform component such that the modified survival time would occur within the final year of the study. Alternatively, when there was only a small number of censored patients, the method that did not alter the de‐identified data set (Method 1) provided the most accurate estimates.

Methods to improve the estimation of time‐to‐event outcomes when data is de‐identified Journal Articles