Home
Scholarly Works
Data repair of density-based data cleaning...
Journal article

Data repair of density-based data cleaning approach using conditional functional dependencies

Abstract

Purpose Data quality is a major challenge in data management. For organizations, the cleanliness of data is a significant problem that affects many business activities. Errors in data occur for different reasons, such as violation of business rules. However, because of the huge amount of data, manual cleaning alone is infeasible. Methods are required to repair and clean the dirty data through automatic detection, which are data quality issues to address. The purpose of this work is to extend the density-based data cleaning approach using conditional functional dependencies to achieve better data repair. Design/methodology/approach A set of conditional functional dependencies is introduced as an input to the density-based data cleaning algorithm. The algorithm repairs inconsistent data using this set. Findings This new approach was evaluated through experiments on real-world as well as synthetic datasets. The repair quality was determined using the F -measure. The results showed that the quality and scalability of the density-based data cleaning approach improved when conditional functional dependencies were introduced. Originality/value Conditional functional dependencies capture semantic errors among data values. This work demonstrates that the density-based data cleaning approach can be improved in terms of repairing inconsistent data by using conditional functional dependencies.

Authors

Al-Janabi S; Janicki R

Journal

Data Technologies and Applications, Vol. 56, No. 3, pp. 429–446

Publisher

Emerald

Publication Date

June 22, 2022

DOI

10.1108/dta-05-2021-0108

ISSN

2514-9288

Contact the Experts team