Home
Scholarly Works
An Algebraic Approach Towards Data Cleaning
Conference

An Algebraic Approach Towards Data Cleaning

Abstract

There has been a proliferation in the amount of data being generated and collected in the past several years. One of the leading factors contributing to this increased data scale is cheaper commodity storage, making it easier for organisations to house large data stores containing massive amounts of historical data. To effectively analyse these data sets, a preprocessing step is often required as most real data sets are inherently dirty and inconsistent. Existing data cleaning tools have focused on cleaning the errors at hand. In this paper, we take a more formal approach and propose the use of information algebra as a general theory to describe structured data sets and data cleaning. We formally define the notion of association rule, association function, and we present results relating these concepts. We also propose an algorithm for generating association rules from a given structured data set.

Authors

Khedri R; Chiang F; Sabri KE

Volume

21

Pagination

pp. 50-59

Publisher

Elsevier

Publication Date

January 1, 2013

DOI

10.1016/j.procs.2013.09.009

Conference proceedings

Procedia Computer Science

ISSN

1877-0509

Contact the Experts team