Home
Scholarly Works
Preserving diversity in anonymized data
Conference

Preserving diversity in anonymized data

Abstract

Recent privacy legislation has aimed to restrict and control the amount of personal data published by companies and shared with third parties. Much of this real data is not only sensitive requiring anonymization but also contains characteristic details from a variety of individuals. This diversity is desirable in many applications ranging from Web search to drug and product development. Unfortunately, data anonymization techniques have largely ignored diversity in its published result. This inadvertently propagates underlying bias in subsequent data analysis. We study the problem of finding a diverse anonymized data instance where diversity is measured via a set of diversity constraints. We formalize diversity constraints, and present a clustering-based algorithm for finding a diverse anonymized instance. We show the effectiveness and efficiency of our techniques against existing baselines. Our work aligns with recent trends towards responsible data science by coupling diversity with privacy-preserving data publishing.

Authors

Milani M; Huang Y; Chiang F

Volume

2021-March

Pagination

pp. 511-516

Publication Date

January 1, 2021

DOI

10.5441/002/edbt.2021.60

Conference proceedings

Advances in Database Technology Edbt
View published work (Non-McMaster Users)

Contact the Experts team