Home
Scholarly Works
Generation and Corruption of Semi-Structured and...
Chapter

Generation and Corruption of Semi-Structured and Structured Data

Abstract

It is crucial for data to be a reliable source of information so that decisions made based on the analysis of this data could provide a competitive edge and reduce the negative impacts that pose significant cost to organizations on an annual basis. This data could have more than one form, including that both of semi-structured and structured data. There are many factors that could corrupt and cause degradation in the quality of data including duplicate records, inaccurate values, inconsistent values, outdated data, or incomplete information. To maintain the quality of data, the algorithms of different data quality management approaches need to be compared, and to accomplish this, common datasets need to be presented. These datasets could be real or synthetic. In the latter type, the datasets need to satisfy intrinsic characteristics of data. However, such datasets are not common for reasons such as privacy constraints in the case of real datasets, or the synthetic data that is generated or corrupted by the existing systems may not satisfy the quality aspects. To address these issues, we present a system that allows for generation of semi-structured and structured data. The generated semi-structured data is XML documents and the generated structured datasets satisfy a set of integrity constraints. Also our system generates other data values such as personal data and sensors data. Additionally, it allows for the corruption of the generated semi-structured and structured data.

Authors

Al-janabi S; Janicki R

Book title

From Security to Community Detection in Social Networking Platforms

Series

Lecture Notes in Social Networks

Pagination

pp. 159-169

Publisher

Springer Nature

Publication Date

January 1, 2019

DOI

10.1007/978-3-030-11286-8_7
View published work (Non-McMaster Users)

Contact the Experts team