Home
Scholarly Works
PACAS: Privacy-Aware, Data Cleaning-as-a-Service
Conference

PACAS: Privacy-Aware, Data Cleaning-as-a-Service

Abstract

Data cleaning consumes up to 80% of the data analysis pipeline. This is a significant overhead for organizations where data cleaning is still a manually driven process requiring domain expertise. Recent advances have fueled a new computing paradigm called Database-as-a-Service, where data management tasks are outsourced to large service providers. We propose a new Data Cleaning-as-a-Service model that allows a client to interact with a data cleaning provider who hosts curated, and sensitive data. We present PACAS: a Privacy-Aware data Cleaning-As-a-Service framework that facilitates communication between the client and the service provider via a data pricing scheme where clients issue queries, and the service provider returns clean answers for a price while protecting her data. We propose a practical privacy model in such interactive settings called (X,Y,L)-anonymity that extends existing data publishing techniques to consider the data semantics while protecting sensitive values. Our evaluation over real data shows that PACAS effectively safeguards semantically related sensitive values, and provides improved accuracy over existing privacy-aware cleaning techniques.

Authors

Huang Y; Milani M; Chiang F

Volume

00

Pagination

pp. 1023-1030

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Publication Date

December 13, 2018

DOI

10.1109/bigdata.2018.8622249

Name of conference

2018 IEEE International Conference on Big Data (Big Data)
View published work (Non-McMaster Users)

Contact the Experts team