datumPIPE

Abstract

Organizations use data to support different business processes. Data may become unclean because of corruptions in the central quality aspects due to factors such as duplicate records, outdated data, inconsistent values, incomplete information, or inaccurate values. Real datasets are usually not available for reasons such as privacy constraints. In the existing systems that generate or corrupt synthetic data, the intrinsic characteristics of data may not satisfy the quality aspects, and the injected types of errors do not corrupt multiple data quality aspects. Also, a lack of common datasets is a primary reason that representative comparisons between algorithms of different data quality management approaches are not possible. To address these issues, we present datumPIPE, a system that allows for the generation of data that satisfies a set of integrity constraints, including functional dependencies (FDs), conditional functional dependencies (CFDs), and inclusion dependencies (INDs). Also, datumPIPE provides the functionality to generate other types of attribute values such as sensors and personal data. It also allows for the corruption of the generated data through the introduction of quality issues in the central data quality aspects.

Authors

Al-janabi S; Hamid A; Janicki R

Pagination

pp. 589-592

Publisher

Association for Computing Machinery (ACM)

Publication Date

July 31, 2017

DOI

10.1145/3110025.3120958

Name of conference

Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017

Associated Experts

Ryszard Janicki

Professor, Faculty of Engineering

Visit profile

Labels