Abstract

Deduplication is a costly and tedious task that involves identifying duplicate records in a dataset. High duplication rates lead to poor data quality, where data ambiguity occurs as to whether two records refer to the same entity. Existing deduplication techniques compare a set of attribute values, and verify whether given similarity thresholds are satisfied. While potential duplicate records are identified, these techniques do not provide …

Authors

Huang Y; Chiang F; Saillet Y; Maier A; Spisic D; Petitclerc M; Zuzarte C

Pagination

pp. 272-278

Publication Date

January 1, 2020

Conference proceedings

Proceedings of the 27th Annual International Conference on Computer Science and Software Engineering Cascon 2017

Associated Experts

Fei Chiang

Associate Professor, Faculty of Engineering

Visit profile