Home
Scholarly Works
Data Anonymization With Diversity Constraints
Journal article

Data Anonymization With Diversity Constraints

Abstract

Recent privacy legislation has aimed to restrict and control the amount of personal data published by companies and shared with third parties. Much of this real data is not only sensitive requiring anonymization but also contains characteristic details from a variety of individuals. This diversity is desirable in many applications ranging from Web search to drug and product development. Unfortunately, data anonymization techniques have largely ignored diversity in its published result. This inadvertently propagates underlying bias in subsequent data analysis. We study the problem of finding a diverse anonymized data instance where diversity is measured via a set of diversity constraints. We formalize diversity constraints, and study their fundamental problems of satisfiability, implication, and validation. We show that determining the existence of a diverse, anonymized instance can be done in PTIME, and we present a clustering-based algorithm, along with optimizations to improve performance. We conduct extensive experiments using real and synthetic data showing the effectiveness of our techniques, and improvement over existing baselines. Our work aligns with recent trends towards responsible data science by coupling diversity with privacy-preserving data publishing.

Authors

Milani M; Huang Y; Chiang F

Journal

IEEE Transactions on Knowledge and Data Engineering, Vol. 35, No. 4, pp. 3603–3618

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Publication Date

April 1, 2023

DOI

10.1109/tkde.2021.3131528

ISSN

1041-4347

Contact the Experts team