Home
Scholarly Works
Corroborating Quality of Data Through Density...
Conference

Corroborating Quality of Data Through Density Information

Abstract

Unclean data is a common issue in data quality. Multiple quality issues may exist in data. Over time, data may become obsolete, transcription errors may make data inaccurate, and integration of multiple data sources may cause multiple tuples referring to the same entity to exist in data. Hence, a cleaning process is necessary. Fixing one issue may not clean the data, relying on human effort to correct the data is expensive, and master data or training data may not be available. This paper studies a data cleaning problem by introducing techniques based on corroboration, i.e. taking into consideration the trustworthiness of the attribute values. It presents a data deduplication approach for data that have outdated and inaccurate values in the duplicated tuples. This approach utilizes the density information embedded inside the tuples to guide the cleaning process. This paper introduces a framework and algorithms to integrate data deduplication with data currency and accuracy that can fix multiple data quality issues without the reliance on manual user interaction, master data, or training data.

Authors

Al-janabi S; Janicki R

Series

Lecture Notes in Networks and Systems

Volume

15

Pagination

pp. 1128-1146

Publisher

Springer Nature

Publication Date

January 1, 2018

DOI

10.1007/978-3-319-56994-9_78

Conference proceedings

Lecture Notes in Networks and Systems

ISSN

2367-3370
View published work (Non-McMaster Users)

Contact the Experts team