Home
Scholarly Works
Active repair of data quality rules
Conference

Active repair of data quality rules

Abstract

The use of data quality rules, which capture business rules and domain constraints, is central to most data quality processes. Poor data quality often arises when the data and these rules (which are meant to preserve data integrity) become inconsistent. To resolve inconsistencies, organizations often implement specific, sometimes manual, sometimes computer-aided, cleansing routines to fix the errors. This solution necessitates frequent repetition of the data cleaning to resolve inconsistencies continually as the data evolves or grows. It is important to recognize that modern organizations may be as dynamic as their data. The business rules, application domain constraints, and data semantics will evolve. As business policies change, as data is integrated with new sources, and as the underlying data evolves, it becomes necessary to manage, evolve, and repair both the data and the rules. In this work, we present a new data quality paradigm that uses automated support for both data and data quality rule repair and management. Previous techniques have focused mostly on updating or correcting the data. In contrast, our approach looks for clues in the data to understand if the data semantics or rules may have evolved. The approach is a holistic one that is designed to facilitate the continuous curation and maintenance of both data and data quality rules. A unique feature of our approach is that we use data mining to discover trends, contextual information, and data patterns that may yield meaningful insights into how a business rule has evolved. Our approach is designed to consider the very wide (many attribute) entity types or tables that are managed by many organizations. We recognize that due to acquisitions or business evolution, data quality rules may need to be evolved to account for many new features (attributes) of the data. We conduct two case studies using real business dataseis that demonstrate the quality and usefulness of our methods in a continuous data quality process that manages and evolves both data and data quality rules. The evaluation provides promising results that show how a business analyst can use our tool to quickly identify errors, and identify if the errors are due to dirty data or to erroneous data quality rules that may need to be evolved. This understanding results in both improved overall data quality, and improved rule quality for better maintenance of new data.

Authors

Chiang F; Miller RJ

Pagination

pp. 174-188

Publication Date

December 1, 2011

Conference proceedings

Iciq 2011 Proceedings of the 16th International Conference on Information Quality

Contact the Experts team