Experts has a new look! Let us know what you think of the updates.

Provide feedback
Home
Scholarly Works
Quantifying duplication to improve data quality
Conference

Quantifying duplication to improve data quality

Abstract

Deduplication is a costly and tedious task that involves identifying duplicate records in a dataset. High duplication rates lead to poor data quality, where data ambiguity occurs as to whether two records refer to the same entity. Existing deduplication techniques compare a set of attribute values, and verify whether given similarity thresholds are satisfied. While potential duplicate records are identified, these techniques do not provide …

Authors

Huang Y; Chiang F; Saillet Y; Maier A; Spisic D; Petitclerc M; Zuzarte C

Pagination

pp. 272-278

Publication Date

January 1, 2020

Conference proceedings

Proceedings of the 27th Annual International Conference on Computer Science and Software Engineering Cascon 2017